9. File and database lookups

XWARNING: This documentation is for an old version of Exim (latest)

Chapter 9 - File and database lookups

Exim can be configured to look up data in files or databases as it processes messages. Two different kinds of syntax are used:

A string that is to be expanded may contain explicit lookup requests. These cause parts of the string to be replaced by data that is obtained from the lookup. Lookups of this type are conditional expansion items. Different results can be defined for the cases of lookup success and failure. See chapter 11, where string expansions are described in detail.
Lists of domains, hosts, and email addresses can contain lookup requests as a way of avoiding excessively long linear lists. In this case, the data that is returned by the lookup is often (but not always) discarded; whether the lookup succeeds or fails is what really counts. These kinds of list are described in chapter 10.

String expansions, lists, and lookups interact with each other in such a way that there is no order in which to describe any one of them that does not involve references to the others. Each of these three chapters makes more sense if you have read the other two first. If you are reading this for the first time, be aware that some of it will make a lot more sense after you have read chapters 10 and 11.

1. Examples of different lookup syntax

It is easy to confuse the two different kinds of lookup, especially as the lists that may contain the second kind are always expanded before being processed as lists. Therefore, they may also contain lookups of the first kind. Be careful to distinguish between the following two examples:

domains = ${lookup{$sender_host_address}lsearch{/some/file}}
domains = lsearch;/some/file

The first uses a string expansion, the result of which must be a domain list. No strings have been specified for a successful or a failing lookup; the defaults in this case are the looked-up data and an empty string, respectively. The expansion takes place before the string is processed as a list, and the file that is searched could contain lines like this:

192.168.3.4: domain1:domain2:...
192.168.1.9: domain3:domain4:...

When the lookup succeeds, the result of the expansion is a list of domains (and possibly other types of item that are allowed in domain lists).

In the second example, the lookup is a single item in a domain list. It causes Exim to use a lookup to see if the domain that is being processed can be found in the file. The file could contains lines like this:

domain1:
domain2:

Any data that follows the keys is not relevant when checking that the domain matches the list item.

It is possible, though no doubt confusing, to use both kinds of lookup at once. Consider a file containing lines like this:

192.168.5.6: lsearch;/another/file

If the value of $sender_host_address is 192.168.5.6, expansion of the first domains setting above generates the second setting, which therefore causes a second lookup to occur.

The rest of this chapter describes the different lookup types that are available. Any of them can be used in any part of the configuration where a lookup is permitted.

2. Lookup types

Two different types of data lookup are implemented:

The single-key type requires the specification of a file in which to look, and a single key to search for. The key must be a non-empty string for the lookup to succeed. The lookup type determines how the file is searched.
The query-style type accepts a generalized database query. No particular key value is assumed by Exim for query-style lookups. You can use whichever Exim variables you need to construct the database query.

The code for each lookup type is in a separate source file that is included in the binary of Exim only if the corresponding compile-time option is set. The default settings in src/EDITME are:

LOOKUP_DBM=yes
LOOKUP_LSEARCH=yes

which means that only linear searching and DBM lookups are included by default. For some types of lookup (e.g. SQL databases), you need to install appropriate libraries and header files before building Exim.

3. Single-key lookup types

The following single-key lookup types are implemented:

cdb: The given file is searched as a Constant DataBase file, using the key string without a terminating binary zero. The cdb format is designed for indexed files that are read frequently and never updated, except by total re-creation. As such, it is particularly suitable for large files containing aliases or other indexed data referenced by an MTA. Information about cdb can be found in several places:
```
http://www.pobox.com/~djb/cdb.html
ftp://ftp.corpit.ru/pub/tinycdb/
http://packages.debian.org/stable/utils/freecdb.html
```
A cdb distribution is not needed in order to build Exim with cdb support, because the code for reading cdb files is included directly in Exim itself. However, no means of building or testing cdb files is provided with Exim, so you need to obtain a cdb distribution in order to do this.
dbm: Calls to DBM library functions are used to extract data from the given DBM file by looking up the record with the given key. A terminating binary zero is included in the key that is passed to the DBM library. See section 4.4 for a discussion of DBM libraries.

For all versions of Berkeley DB, Exim uses the DB_HASH style of database when building DBM files using the exim_dbmbuild utility. However, when using Berkeley DB versions 3 or 4, it opens existing databases for reading with the DB_UNKNOWN option. This enables it to handle any of the types of database that the library supports, and can be useful for accessing DBM files created by other applications. (For earlier DB versions, DB_HASH is always used.)
dbmnz: This is the same as dbm, except that a terminating binary zero is not included in the key that is passed to the DBM library. You may need this if you want to look up data in files that are created by or shared with some other application that does not use terminating zeros. For example, you need to use dbmnz rather than dbm if you want to authenticate incoming SMTP calls using the passwords from Courier’s /etc/userdbshadow.dat file. Exim’s utility program for creating DBM files (exim_dbmbuild) includes the zeros by default, but has an option to omit them (see section 50.9).
dsearch: The given file must be a directory; this is searched for an entry whose name is the key by calling the lstat() function. The key may not contain any forward slash characters. If lstat() succeeds, the result of the lookup is the name of the entry, which may be a file, directory, symbolic link, or any other kind of directory entry. An example of how this lookup can be used to support virtual domains is given in section 47.7.
iplsearch: The given file is a text file containing keys and data. A key is terminated by a colon or white space or the end of the line. The keys in the file must be IP addresses, or IP addresses with CIDR masks. Keys that involve IPv6 addresses must be enclosed in quotes to prevent the first internal colon being interpreted as a key terminator. For example:
```
1.2.3.4:           data for 1.2.3.4
192.168.0.0/16:    data for 192.168.0.0/16
"abcd::cdab":      data for abcd::cdab
"abcd:abcd::/32"   data for abcd:abcd::/32
```
The key for an iplsearch lookup must be an IP address (without a mask). The file is searched linearly, using the CIDR masks where present, until a matching key is found. The first key that matches is used; there is no attempt to find a “best” match. Apart from the way the keys are matched, the processing for iplsearch is the same as for lsearch.

Warning 1: Unlike most other single-key lookup types, a file of data for iplsearch can not be turned into a DBM or cdb file, because those lookup types support only literal keys.

Warning 2: In a host list, you must always use net-iplsearch so that the implicit key is the host’s IP address rather than its name (see section 10.12).
lsearch: The given file is a text file that is searched linearly for a line beginning with the search key, terminated by a colon or white space or the end of the line. The search is case-insensitive; that is, upper and lower case letters are treated as the same. The first occurrence of the key that is found in the file is used.

White space between the key and the colon is permitted. The remainder of the line, with leading and trailing white space removed, is the data. This can be continued onto subsequent lines by starting them with any amount of white space, but only a single space character is included in the data at such a junction. If the data begins with a colon, the key must be terminated by a colon, for example:
```
baduser:  :fail:
```
Empty lines and lines beginning with # are ignored, even if they occur in the middle of an item. This is the traditional textual format of alias files. Note that the keys in an lsearch file are literal strings. There is no wildcarding of any kind.

In most lsearch files, keys are not required to contain colons or # characters, or white space. However, if you need this feature, it is available. If a key begins with a doublequote character, it is terminated only by a matching quote (or end of line), and the normal escaping rules apply to its contents (see section 6.16). An optional colon is permitted after quoted keys (exactly as for unquoted keys). There is no special handling of quotes for the data part of an lsearch line.
nis: The given file is the name of a NIS map, and a NIS lookup is done with the given key, without a terminating binary zero. There is a variant called nis0 which does include the terminating binary zero in the key. This is reportedly needed for Sun-style alias files. Exim does not recognize NIS aliases; the full map names must be used.
wildlsearch or nwildlsearch: These search a file linearly, like lsearch, but instead of being interpreted as a literal string, each key in the file may be wildcarded. The difference between these two lookup types is that for wildlsearch, each key in the file is string-expanded before being used, whereas for nwildlsearch, no expansion takes place.

Like lsearch, the testing is done case-insensitively. However, keys in the file that are regular expressions can be made case-sensitive by the use of (-i) within the pattern. The following forms of wildcard are recognized:
1. The string may begin with an asterisk to mean “ends with”. For example:
```
    *.a.b.c       data for anything.a.b.c
    *fish         data for anythingfish
```
2. The string may begin with a circumflex to indicate a regular expression. For example, for wildlsearch:
```
    ^\N\d+\.a\.b\N    data for <digits>.a.b
```
  Note the use of \N to disable expansion of the contents of the regular expression. If you are using nwildlsearch, where the keys are not string-expanded, the equivalent entry is:
```
    ^\d+\.a\.b        data for <digits>.a.b
```
  The case-insensitive flag is set at the start of compiling the regular expression, but it can be turned off by using (-i) at an appropriate point. For example, to make the entire pattern case-sensitive:
```
    ^(?-i)\d+\.a\.b        data for <digits>.a.b
```
  If the regular expression contains white space or colon characters, you must either quote it (see lsearch above), or represent these characters in other ways. For example, \s can be used for white space and \x3A for a colon. This may be easier than quoting, because if you quote, you have to escape all the backslashes inside the quotes.
  
  Note: It is not possible to capture substrings in a regular expression match for later use, because the results of all lookups are cached. If a lookup is repeated, the result is taken from the cache, and no actual pattern matching takes place. The values of all the numeric variables are unset after a (n)wildlsearch match.
3. Although I cannot see it being of much use, the general matching function that is used to implement (n)wildlsearch means that the string may begin with a lookup name terminated by a semicolon, and followed by lookup data. For example:
```
    cdb;/some/file  data for keys that match the file
```
  The data that is obtained from the nested lookup is discarded.
Keys that do not match any of these patterns are interpreted literally. The continuation rules for the data are the same as for lsearch, and keys may be followed by optional colons.

Warning: Unlike most other single-key lookup types, a file of data for (n)wildlsearch can not be turned into a DBM or cdb file, because those lookup types support only literal keys.

4. Query-style lookup types

The supported query-style lookup types are listed below. Further details about many of them are given in later sections.

dnsdb: This does a DNS search for one or more records whose domain names are given in the supplied query. The resulting data is the contents of the records. See section 9.10.
ibase: This does a lookup in an InterBase database.
ldap: This does an LDAP lookup using a query in the form of a URL, and returns attributes from a single entry. There is a variant called ldapm that permits values from multiple entries to be returned. A third variant called ldapdn returns the Distinguished Name of a single entry instead of any attribute values. See section 9.13.
mysql: The format of the query is an SQL statement that is passed to a MySQL database. See section 9.20.
nisplus: This does a NIS+ lookup using a query that can specify the name of the field to be returned. See section 9.19.
oracle: The format of the query is an SQL statement that is passed to an Oracle database. See section 9.20.
passwd is a query-style lookup with queries that are just user names. The lookup calls getpwnam() to interrogate the system password data, and on success, the result string is the same as you would get from an lsearch lookup on a traditional /etc/passwd file, though with * for the password value. For example:
```
*:42:42:King Rat:/home/kr:/bin/bash
```
pgsql: The format of the query is an SQL statement that is passed to a PostgreSQL database. See section 9.20.
sqlite: The format of the query is a file name followed by an SQL statement that is passed to an SQLite database. See section 9.25.
testdb: This is a lookup type that is used for testing Exim. It is not likely to be useful in normal operation.
whoson: Whoson (http://whoson.sourceforge.net) is a protocol that allows a server to check whether a particular (dynamically allocated) IP address is currently allocated to a known (trusted) user and, optionally, to obtain the identity of the said user. For SMTP servers, Whoson was popular at one time for “POP before SMTP” authentication, but that approach has been superseded by SMTP authentication. In Exim, Whoson can be used to implement “POP before SMTP” checking using ACL statements such as
```
require condition = \
  ${lookup whoson {$sender_host_address}{yes}{no}}
```
The query consists of a single IP address. The value returned is the name of the authenticated user, which is stored in the variable $value. However, in this example, the data in $value is not used; the result of the lookup is one of the fixed strings “yes” or “no”.

5. Temporary errors in lookups

Lookup functions can return temporary error codes if the lookup cannot be completed. For example, an SQL or LDAP database might be unavailable. For this reason, it is not advisable to use a lookup that might do this for critical options such as a list of local domains.

When a lookup cannot be completed in a router or transport, delivery of the message (to the relevant address) is deferred, as for any other temporary error. In other circumstances Exim may assume the lookup has failed, or may give up altogether.

6. Default values in single-key lookups

In this context, a “default value” is a value specified by the administrator that is to be used if a lookup fails.

Note: This section applies only to single-key lookups. For query-style lookups, the facilities of the query language must be used. An attempt to specify a default for a query-style lookup provokes an error.

If “*” is added to a single-key lookup type (for example, lsearch*) and the initial lookup fails, the key “*” is looked up in the file to provide a default value. See also the section on partial matching below.

Alternatively, if “*@” is added to a single-key lookup type (for example dbm*@) then, if the initial lookup fails and the key contains an @ character, a second lookup is done with everything before the last @ replaced by *. This makes it possible to provide per-domain defaults in alias files that include the domains in the keys. If the second lookup fails (or doesn’t take place because there is no @ in the key), “*” is looked up. For example, a redirect router might contain:

data = ${lookup{$local_part@$domain}lsearch*@{/etc/mix-aliases}}

Suppose the address that is being processed is jane@eyre.example. Exim looks up these keys, in this order:

jane@eyre.example
*@eyre.example
*

The data is taken from whichever key it finds first. Note: In an lsearch file, this does not mean the first of these keys in the file. A complete scan is done for each key, and only if it is not found at all does Exim move on to try the next key.

7. Partial matching in single-key lookups

The normal operation of a single-key lookup is to search the file for an exact match with the given key. However, in a number of situations where domains are being looked up, it is useful to be able to do partial matching. In this case, information in the file that has a key starting with “*.” is matched by any domain that ends with the components that follow the full stop. For example, if a key in a DBM file is

*.dates.fict.example

then when partial matching is enabled this is matched by (amongst others) 2001.dates.fict.example and 1984.dates.fict.example. It is also matched by dates.fict.example, if that does not appear as a separate key in the file.

Note: Partial matching is not available for query-style lookups. It is also not available for any lookup items in address lists (see section 10.19).

Partial matching is implemented by doing a series of separate lookups using keys constructed by modifying the original subject key. This means that it can be used with any of the single-key lookup types, provided that partial matching keys beginning with a special prefix (default “*.”) are included in the data file. Keys in the file that do not begin with the prefix are matched only by unmodified subject keys when partial matching is in use.

Partial matching is requested by adding the string “partial-” to the front of the name of a single-key lookup type, for example, partial-dbm. When this is done, the subject key is first looked up unmodified; if that fails, “*.” is added at the start of the subject key, and it is looked up again. If that fails, further lookups are tried with dot-separated components removed from the start of the subject key, one-by-one, and “*.” added on the front of what remains.

A minimum number of two non-* components are required. This can be adjusted by including a number before the hyphen in the search type. For example, partial3-lsearch specifies a minimum of three non-* components in the modified keys. Omitting the number is equivalent to “partial2-”. If the subject key is 2250.dates.fict.example then the following keys are looked up when the minimum number of non-* components is two:

2250.dates.fict.example
*.2250.dates.fict.example
*.dates.fict.example
*.fict.example

As soon as one key in the sequence is successfully looked up, the lookup finishes.

The use of “*.” as the partial matching prefix is a default that can be changed. The motivation for this feature is to allow Exim to operate with file formats that are used by other MTAs. A different prefix can be supplied in parentheses instead of the hyphen after “partial”. For example:

domains = partial(.)lsearch;/some/file

In this example, if the domain is a.b.c, the sequence of lookups is a.b.c, .a.b.c, and .b.c (the default minimum of 2 non-wild components is unchanged). The prefix may consist of any punctuation characters other than a closing parenthesis. It may be empty, for example:

domains = partial1()cdb;/some/file

For this example, if the domain is a.b.c, the sequence of lookups is a.b.c, b.c, and c.

If “partial0” is specified, what happens at the end (when the lookup with just one non-wild component has failed, and the original key is shortened right down to the null string) depends on the prefix:

If the prefix has zero length, the whole lookup fails.
If the prefix has length 1, a lookup for just the prefix is done. For example, the final lookup for “partial0(.)” is for . alone.
Otherwise, if the prefix ends in a dot, the dot is removed, and the remainder is looked up. With the default prefix, therefore, the final lookup is for “*” on its own.
Otherwise, the whole prefix is looked up.

If the search type ends in “*” or “*@” (see section 9.6 above), the search for an ultimate default that this implies happens after all partial lookups have failed. If “partial0” is specified, adding “*” to the search type has no effect with the default prefix, because the “*” key is already included in the sequence of partial lookups. However, there might be a use for lookup types such as “partial0(.)lsearch*”.

The use of “*” in lookup partial matching differs from its use as a wildcard in domain lists and the like. Partial matching works only in terms of dot-separated components; a key such as *fict.example in a database file is useless, because the asterisk in a partial matching subject key is always followed by a dot.

8. Lookup caching

Exim caches all lookup results in order to avoid needless repetition of lookups. However, because (apart from the daemon) Exim operates as a collection of independent, short-lived processes, this caching applies only within a single Exim process. There is no inter-process lookup caching facility.

For single-key lookups, Exim keeps the relevant files open in case there is another lookup that needs them. In some types of configuration this can lead to many files being kept open for messages with many recipients. To avoid hitting the operating system limit on the number of simultaneously open files, Exim closes the least recently used file when it needs to open more files than its own internal limit, which can be changed via the lookup_open_max option.

The single-key lookup files are closed and the lookup caches are flushed at strategic points during delivery – for example, after all routing is complete.

9. Quoting lookup data

When data from an incoming message is included in a query-style lookup, there is the possibility of special characters in the data messing up the syntax of the query. For example, a NIS+ query that contains

[name=$local_part]

will be broken if the local part happens to contain a closing square bracket. For NIS+, data can be enclosed in double quotes like this:

[name="$local_part"]

but this still leaves the problem of a double quote in the data. The rule for NIS+ is that double quotes must be doubled. Other lookup types have different rules, and to cope with the differing requirements, an expansion operator of the following form is provided:

${quote_<lookup-type>:<string>}

For example, the safest way to write the NIS+ query is

[name="${quote_nisplus:$local_part}"]

See chapter 11 for full coverage of string expansions. The quote operator can be used for all lookup types, but has no effect for single-key lookups, since no quoting is ever needed in their key strings.

10. More about dnsdb

The dnsdb lookup type uses the DNS as its database. A simple query consists of a record type and a domain name, separated by an equals sign. For example, an expansion string could contain:

${lookup dnsdb{mx=a.b.example}{$value}fail}

If the lookup succeeds, the result is placed in $value, which in this case is used on its own as the result. If the lookup does not succeed, the fail keyword causes a forced expansion failure – see section 11.4 for an explanation of what this means.

The supported DNS record types are A, CNAME, MX, NS, PTR, SRV, and TXT, and, when Exim is compiled with IPv6 support, AAAA (and A6 if that is also configured). If no type is given, TXT is assumed. When the type is PTR, the data can be an IP address, written as normal; inversion and the addition of in-addr.arpa or ip6.arpa happens automatically. For example:

${lookup dnsdb{ptr=192.168.4.5}{$value}fail}

If the data for a PTR record is not a syntactically valid IP address, it is not altered and nothing is added.

For an MX lookup, both the preference value and the host name are returned for each record, separated by a space. For an SRV lookup, the priority, weight, port, and host name are returned for each record, separated by spaces.

For any record type, if multiple records are found (or, for A6 lookups, if a single record leads to multiple addresses), the data is returned as a concatenation, with newline as the default separator. The order, of course, depends on the DNS resolver. You can specify a different separator character between multiple records by putting a right angle-bracket followed immediately by the new separator at the start of the query. For example:

${lookup dnsdb{>: a=host1.example}}

It is permitted to specify a space as the separator character. Further white space is ignored.

For TXT records with multiple items of data, only the first item is returned, unless a separator for them is specified using a comma after the separator character followed immediately by the TXT record item separator. To concatenate items without a separator, use a semicolon instead.

${lookup dnsdb{>\n,: txt=a.b.example}}
${lookup dnsdb{>\n; txt=a.b.example}}

It is permitted to specify a space as the separator character. Further white space is ignored.

11. Pseudo dnsdb record types

By default, both the preference value and the host name are returned for each MX record, separated by a space. If you want only host names, you can use the pseudo-type MXH:

${lookup dnsdb{mxh=a.b.example}}

In this case, the preference values are omitted, and just the host names are returned.

Another pseudo-type is ZNS (for “zone NS”). It performs a lookup for NS records on the given domain, but if none are found, it removes the first component of the domain name, and tries again. This process continues until NS records are found or there are no more components left (or there is a DNS error). In other words, it may return the name servers for a top-level domain, but it never returns the root name servers. If there are no NS records for the top-level domain, the lookup fails. Consider these examples:

${lookup dnsdb{zns=xxx.quercite.com}}
${lookup dnsdb{zns=xxx.edu}}

Assuming that in each case there are no NS records for the full domain name, the first returns the name servers for quercite.com, and the second returns the name servers for edu.

You should be careful about how you use this lookup because, unless the top-level domain does not exist, the lookup always returns some host names. The sort of use to which this might be put is for seeing if the name servers for a given domain are on a blacklist. You can probably assume that the name servers for the high-level domains such as com or co.uk are not going to be on such a list.

A third pseudo-type is CSA (Client SMTP Authorization). This looks up SRV records according to the CSA rules, which are described in section 40.46. Although dnsdb supports SRV lookups directly, this is not sufficient because of the extra parent domain search behaviour of CSA. The result of a successful lookup such as:

${lookup dnsdb {csa=$sender_helo_name}}

has two space-separated fields: an authorization code and a target host name. The authorization code can be “Y” for yes, “N” for no, “X” for explicit authorization required but absent, or “?” for unknown.

12. Multiple dnsdb lookups

In the previous sections, dnsdb lookups for a single domain are described. However, you can specify a list of domains or IP addresses in a single dnsdb lookup. The list is specified in the normal Exim way, with colon as the default separator, but with the ability to change this. For example:

${lookup dnsdb{one.domain.com:two.domain.com}}
${lookup dnsdb{a=one.host.com:two.host.com}}
${lookup dnsdb{ptr = <; 1.2.3.4 ; 4.5.6.8}}

In order to retain backwards compatibility, there is one special case: if the lookup type is PTR and no change of separator is specified, Exim looks to see if the rest of the string is precisely one IPv6 address. In this case, it does not treat it as a list.

The data from each lookup is concatenated, with newline separators by default, in the same way that multiple DNS records for a single item are handled. A different separator can be specified, as described above.

The dnsdb lookup fails only if all the DNS lookups fail. If there is a temporary DNS error for any of them, the behaviour is controlled by an optional keyword followed by a comma that may appear before the record type. The possible keywords are “defer_strict”, “defer_never”, and “defer_lax”. With “strict” behaviour, any temporary DNS error causes the whole lookup to defer. With “never” behaviour, a temporary DNS error is ignored, and the behaviour is as if the DNS lookup failed to find anything. With “lax” behaviour, all the queries are attempted, but a temporary DNS error causes the whole lookup to defer only if none of the other lookups succeed. The default is “lax”, so the following lookups are equivalent:

${lookup dnsdb{defer_lax,a=one.host.com:two.host.com}}
${lookup dnsdb{a=one.host.com:two.host.com}}

Thus, in the default case, as long as at least one of the DNS lookups yields some data, the lookup succeeds.

13. More about LDAP

The original LDAP implementation came from the University of Michigan; this has become “Open LDAP”, and there are now two different releases. Another implementation comes from Netscape, and Solaris 7 and subsequent releases contain inbuilt LDAP support. Unfortunately, though these are all compatible at the lookup function level, their error handling is different. For this reason it is necessary to set a compile-time variable when building Exim with LDAP, to indicate which LDAP library is in use. One of the following should appear in your Local/Makefile:

LDAP_LIB_TYPE=UMICHIGAN
LDAP_LIB_TYPE=OPENLDAP1
LDAP_LIB_TYPE=OPENLDAP2
LDAP_LIB_TYPE=NETSCAPE
LDAP_LIB_TYPE=SOLARIS

If LDAP_LIB_TYPE is not set, Exim assumes OPENLDAP1, which has the same interface as the University of Michigan version.

There are three LDAP lookup types in Exim. These behave slightly differently in the way they handle the results of a query:

ldap requires the result to contain just one entry; if there are more, it gives an error.
ldapdn also requires the result to contain just one entry, but it is the Distinguished Name that is returned rather than any attribute values.
ldapm permits the result to contain more than one entry; the attributes from all of them are returned.

For ldap and ldapm, if a query finds only entries with no attributes, Exim behaves as if the entry did not exist, and the lookup fails. The format of the data returned by a successful lookup is described in the next section. First we explain how LDAP queries are coded.

14. Format of LDAP queries

An LDAP query takes the form of a URL as defined in RFC 2255. For example, in the configuration of a redirect router one might have this setting:

data = ${lookup ldap \
  {ldap:///cn=$local_part,o=University%20of%20Cambridge,\
  c=UK?mailbox?base?}}

The URL may begin with ldap or ldaps if your LDAP library supports secure (encrypted) LDAP connections. The second of these ensures that an encrypted TLS connection is used.

15. LDAP quoting

Two levels of quoting are required in LDAP queries, the first for LDAP itself and the second because the LDAP query is represented as a URL. Furthermore, within an LDAP query, two different kinds of quoting are required. For this reason, there are two different LDAP-specific quoting operators.

The quote_ldap operator is designed for use on strings that are part of filter specifications. Conceptually, it first does the following conversions on the string:

*   =>   \2A
(   =>   \28
)   =>   \29
\   =>   \5C

in accordance with RFC 2254. The resulting string is then quoted according to the rules for URLs, that is, all non-alphanumeric characters except

! $ ' - . _ ( ) * +

are converted to their hex values, preceded by a percent sign. For example:

${quote_ldap: a(bc)*, a<yz>; }

yields

%20a%5C28bc%5C29%5C2A%2C%20a%3Cyz%3E%3B%20

Removing the URL quoting, this is (with a leading and a trailing space):

a\28bc\29\2A, a<yz>;

The quote_ldap_dn operator is designed for use on strings that are part of base DN specifications in queries. Conceptually, it first converts the string by inserting a backslash in front of any of the following characters:

, + " \ < > ;

It also inserts a backslash before any leading spaces or # characters, and before any trailing spaces. (These rules are in RFC 2253.) The resulting string is then quoted according to the rules for URLs. For example:

${quote_ldap_dn: a(bc)*, a<yz>; }

yields

%5C%20a(bc)*%5C%2C%20a%5C%3Cyz%5C%3E%5C%3B%5C%20

Removing the URL quoting, this is (with a trailing space):

\ a(bc)*\, a\<yz\>\;\

There are some further comments about quoting in the section on LDAP authentication below.

16. LDAP connections

The connection to an LDAP server may either be over TCP/IP, or, when OpenLDAP is in use, via a Unix domain socket. The example given above does not specify an LDAP server. A server that is reached by TCP/IP can be specified in a query by starting it with

ldap://<hostname>:<port>/...

If the port (and preceding colon) are omitted, the standard LDAP port (389) is used. When no server is specified in a query, a list of default servers is taken from the ldap_default_servers configuration option. This supplies a colon-separated list of servers which are tried in turn until one successfully handles a query, or there is a serious error. Successful handling either returns the requested data, or indicates that it does not exist. Serious errors are syntactical, or multiple values when only a single value is expected. Errors which cause the next server to be tried are connection failures, bind failures, and timeouts.

For each server name in the list, a port number can be given. The standard way of specifying a host and port is to use a colon separator (RFC 1738). Because ldap_default_servers is a colon-separated list, such colons have to be doubled. For example

ldap_default_servers = ldap1.example.com::145:ldap2.example.com

If ldap_default_servers is unset, a URL with no server name is passed to the LDAP library with no server name, and the library’s default (normally the local host) is used.

If you are using the OpenLDAP library, you can connect to an LDAP server using a Unix domain socket instead of a TCP/IP connection. This is specified by using ldapi instead of ldap in LDAP queries. What follows here applies only to OpenLDAP. If Exim is compiled with a different LDAP library, this feature is not available.

For this type of connection, instead of a host name for the server, a pathname for the socket is required, and the port number is not relevant. The pathname can be specified either as an item in ldap_default_servers, or inline in the query. In the former case, you can have settings such as

ldap_default_servers = /tmp/ldap.sock : backup.ldap.your.domain

When the pathname is given in the query, you have to escape the slashes as %2F to fit in with the LDAP URL syntax. For example:

${lookup ldap {ldapi://%2Ftmp%2Fldap.sock/o=...

When Exim processes an LDAP lookup and finds that the “hostname” is really a pathname, it uses the Unix domain socket code, even if the query actually specifies ldap or ldaps. In particular, no encryption is used for a socket connection. This behaviour means that you can use a setting of ldap_default_servers such as in the example above with traditional ldap or ldaps queries, and it will work. First, Exim tries a connection via the Unix domain socket; if that fails, it tries a TCP/IP connection to the backup host.

If an explicit ldapi type is given in a query when a host name is specified, an error is diagnosed. However, if there are more items in ldap_default_servers, they are tried. In other words:

Using a pathname with ldap or ldaps forces the use of the Unix domain interface.
Using ldapi with a host name causes an error.

Using ldapi with no host or path in the query, and no setting of ldap_default_servers, does whatever the library does by default.

17. LDAP authentication and control information

The LDAP URL syntax provides no way of passing authentication and other control information to the server. To make this possible, the URL in an LDAP query may be preceded by any number of <name>=<value> settings, separated by spaces. If a value contains spaces it must be enclosed in double quotes, and when double quotes are used, backslash is interpreted in the usual way inside them. The following names are recognized:

DEREFERENCE  set the dereferencing parameter
NETTIME      set a timeout for a network operation
USER         set the DN, for authenticating the LDAP bind
PASS         set the password, likewise
REFERRALS    set the referrals parameter
SIZE         set the limit for the number of entries returned
TIME         set the maximum waiting time for a query

The value of the DEREFERENCE parameter must be one of the words “never”, “searching”, “finding”, or “always”. The value of the REFERRALS parameter must be “follow” (the default) or “nofollow”. The latter stops the LDAP library from trying to follow referrals issued by the LDAP server.

The name CONNECT is an obsolete name for NETTIME, retained for backwards compatibility. This timeout (specified as a number of seconds) is enforced from the client end for operations that can be carried out over a network. Specifically, it applies to network connections and calls to the ldap_result() function. If the value is greater than zero, it is used if LDAP_OPT_NETWORK_TIMEOUT is defined in the LDAP headers (OpenLDAP), or if LDAP_X_OPT_CONNECT_TIMEOUT is defined in the LDAP headers (Netscape SDK 4.1). A value of zero forces an explicit setting of “no timeout” for Netscape SDK; for OpenLDAP no action is taken.

The TIME parameter (also a number of seconds) is passed to the server to set a server-side limit on the time taken to complete a search.

Here is an example of an LDAP query in an Exim lookup that uses some of these values. This is a single line, folded to fit on the page:

${lookup ldap
  {user="cn=manager,o=University of Cambridge,c=UK" pass=secret
  ldap:///o=University%20of%20Cambridge,c=UK?sn?sub?(cn=foo)}
  {$value}fail}

The encoding of spaces as %20 is a URL thing which should not be done for any of the auxiliary data. Exim configuration settings that include lookups which contain password information should be preceded by “hide” to prevent non-admin users from using the -bP option to see their values.

The auxiliary data items may be given in any order. The default is no connection timeout (the system timeout is used), no user or password, no limit on the number of entries returned, and no time limit on queries.

When a DN is quoted in the USER= setting for LDAP authentication, Exim removes any URL quoting that it may contain before passing it LDAP. Apparently some libraries do this for themselves, but some do not. Removing the URL quoting has two advantages:

It makes it possible to use the same quote_ldap_dn expansion for USER= DNs as with DNs inside actual queries.
It permits spaces inside USER= DNs.

For example, a setting such as

USER=cn=${quote_ldap_dn:$1}

should work even if $1 contains spaces.

Expanded data for the PASS= value should be quoted using the quote expansion operator, rather than the LDAP quote operators. The only reason this field needs quoting is to ensure that it conforms to the Exim syntax, which does not allow unquoted spaces. For example:

PASS=${quote:$3}

The LDAP authentication mechanism can be used to check passwords as part of SMTP authentication. See the ldapauth expansion string condition in chapter 11.

18. Format of data returned by LDAP

The ldapdn lookup type returns the Distinguished Name from a single entry as a sequence of values, for example

cn=manager, o=University of Cambridge, c=UK

The ldap lookup type generates an error if more than one entry matches the search filter, whereas ldapm permits this case, and inserts a newline in the result between the data from different entries. It is possible for multiple values to be returned for both ldap and ldapm, but in the former case you know that whatever values are returned all came from a single entry in the directory.

In the common case where you specify a single attribute in your LDAP query, the result is not quoted, and does not contain the attribute name. If the attribute has multiple values, they are separated by commas.

If you specify multiple attributes, the result contains space-separated, quoted strings, each preceded by the attribute name and an equals sign. Within the quotes, the quote character, backslash, and newline are escaped with backslashes, and commas are used to separate multiple values for the attribute. Apart from the escaping, the string within quotes takes the same form as the output when a single attribute is requested. Specifying no attributes is the same as specifying all of an entry’s attributes.

Here are some examples of the output format. The first line of each pair is an LDAP query, and the second is the data that is returned. The attribute called attr1 has two values, whereas attr2 has only one value:

ldap:///o=base?attr1?sub?(uid=fred)
value1.1, value1.2

ldap:///o=base?attr2?sub?(uid=fred)
value two

ldap:///o=base?attr1,attr2?sub?(uid=fred)
attr1="value1.1, value1.2" attr2="value two"

ldap:///o=base??sub?(uid=fred)
objectClass="top" attr1="value1.1, value1.2" attr2="value two"

The extract operator in string expansions can be used to pick out individual fields from data that consists of key=value pairs. You can make use of Exim’s -be option to run expansion tests and thereby check the results of LDAP lookups.

19. More about NIS+

NIS+ queries consist of a NIS+ indexed name followed by an optional colon and field name. If this is given, the result of a successful query is the contents of the named field; otherwise the result consists of a concatenation of field-name=field-value pairs, separated by spaces. Empty values and values containing spaces are quoted. For example, the query

[name=mg1456],passwd.org_dir

might return the string

name=mg1456 passwd="" uid=999 gid=999 gcos="Martin Guerre"
home=/home/mg1456 shell=/bin/bash shadow=""

(split over two lines here to fit on the page), whereas

[name=mg1456],passwd.org_dir:gcos

would just return

Martin Guerre

with no quotes. A NIS+ lookup fails if NIS+ returns more than one table entry for the given indexed key. The effect of the quote_nisplus expansion operator is to double any quote characters within the text.

20. SQL lookups

Exim can support lookups in InterBase, MySQL, Oracle, PostgreSQL, and SQLite databases. Queries for these databases contain SQL statements, so an example might be

${lookup mysql{select mailbox from users where id='userx'}\
  {$value}fail}

If the result of the query contains more than one field, the data for each field in the row is returned, preceded by its name, so the result of

${lookup pgsql{select home,name from users where id='userx'}\
  {$value}}

might be

home=/home/userx name="Mister X"

Empty values and values containing spaces are double quoted, with embedded quotes escaped by a backslash. If the result of the query contains just one field, the value is passed back verbatim, without a field name, for example:

Mister X

If the result of the query yields more than one row, it is all concatenated, with a newline between the data for each row.

21. More about MySQL, PostgreSQL, Oracle, and InterBase

If any MySQL, PostgreSQL, Oracle, or InterBase lookups are used, the mysql_servers, pgsql_servers, oracle_servers, or ibase_servers option (as appropriate) must be set to a colon-separated list of server information. (For MySQL and PostgreSQL only, the global option need not be set if all queries contain their own server information – see section 9.22.) Each item in the list is a slash-separated list of four items: host name, database name, user name, and password. In the case of Oracle, the host name field is used for the “service name”, and the database name field is not used and should be empty. For example:

hide oracle_servers = oracle.plc.example//userx/abcdwxyz

Because password data is sensitive, you should always precede the setting with “hide”, to prevent non-admin users from obtaining the setting via the -bP option. Here is an example where two MySQL servers are listed:

hide mysql_servers = localhost/users/root/secret:\
                     otherhost/users/root/othersecret

For MySQL and PostgreSQL, a host may be specified as <name>:<port> but because this is a colon-separated list, the colon has to be doubled. For each query, these parameter groups are tried in order until a connection is made and a query is successfully processed. The result of a query may be that no data is found, but that is still a successful query. In other words, the list of servers provides a backup facility, not a list of different places to look.

The quote_mysql, quote_pgsql, and quote_oracle expansion operators convert newline, tab, carriage return, and backspace to \n, \t, \r, and \b respectively, and the characters single-quote, double-quote, and backslash itself are escaped with backslashes. The quote_pgsql expansion operator, in addition, escapes the percent and underscore characters. This cannot be done for MySQL because these escapes are not recognized in contexts where these characters are not special.

22. Specifying the server in the query

For MySQL and PostgreSQL lookups (but not currently for Oracle and InterBase), it is possible to specify a list of servers with an individual query. This is done by starting the query with

servers=server1:server2:server3:...;

Each item in the list may take one of two forms:

If it contains no slashes it is assumed to be just a host name. The appropriate global option (mysql_servers or pgsql_servers) is searched for a host of the same name, and the remaining parameters (database, user, password) are taken from there.
If it contains any slashes, it is taken as a complete parameter set.

The list of servers is used in exactly the same way as the global list. Once a connection to a server has happened and a query has been successfully executed, processing of the lookup ceases.

This feature is intended for use in master/slave situations where updates are occurring and you want to update the master rather than a slave. If the master is in the list as a backup for reading, you might have a global setting like this:

mysql_servers = slave1/db/name/pw:\
                slave2/db/name/pw:\
                master/db/name/pw

In an updating lookup, you could then write:

${lookup mysql{servers=master; UPDATE ...} }

That query would then be sent only to the master server. If, on the other hand, the master is not to be used for reading, and so is not present in the global option, you can still update it by a query of this form:

${lookup pgsql{servers=master/db/name/pw; UPDATE ...} }

23. Special MySQL features

For MySQL, an empty host name or the use of “localhost” in mysql_servers causes a connection to the server on the local host by means of a Unix domain socket. An alternate socket can be specified in parentheses. The full syntax of each item in mysql_servers is:

<hostname>::<port>(<socket name>)/<database>/<user>/<password>

Any of the three sub-parts of the first field can be omitted. For normal use on the local host it can be left blank or set to just “localhost”.

No database need be supplied – but if it is absent here, it must be given in the queries.

If a MySQL query is issued that does not request any data (an insert, update, or delete command), the result of the lookup is the number of rows affected.

Warning: This can be misleading. If an update does not actually change anything (for example, setting a field to the value it already has), the result is zero because no rows are affected.

24. Special PostgreSQL features

PostgreSQL lookups can also use Unix domain socket connections to the database. This is usually faster and costs less CPU time than a TCP/IP connection. However it can be used only if the mail server runs on the same machine as the database server. A configuration line for PostgreSQL via Unix domain sockets looks like this:

hide pgsql_servers = (/tmp/.s.PGSQL.5432)/db/user/password : ...

In other words, instead of supplying a host name, a path to the socket is given. The path name is enclosed in parentheses so that its slashes aren’t visually confused with the delimiters for the other server parameters.

If a PostgreSQL query is issued that does not request any data (an insert, update, or delete command), the result of the lookup is the number of rows affected.

25. More about SQLite

SQLite is different to the other SQL lookups because a file name is required in addition to the SQL query. An SQLite database is a single file, and there is no daemon as in the other SQL databases. The interface to Exim requires the name of the file, as an absolute path, to be given at the start of the query. It is separated from the query by white space. This means that the path name cannot contain white space. Here is a lookup expansion example:

${lookup sqlite {/some/thing/sqlitedb \
  select name from aliases where id='userx';}}

In a list, the syntax is similar. For example:

domainlist relay_domains = sqlite;/some/thing/sqlitedb \
   select * from relays where ip='$sender_host_address';

The only character affected by the quote_sqlite operator is a single quote, which it doubles.

The SQLite library handles multiple simultaneous accesses to the database internally. Multiple readers are permitted, but only one process can update at once. Attempts to access the database while it is being updated are rejected after a timeout period, during which the SQLite library waits for the lock to be released. In Exim, the default timeout is set to 5 seconds, but it can be changed by means of the sqlite_lock_timeout option.

<-previous Table of Contents next->

Exim Internet Mailer