The fifth part of the configuration file contains a list of retry rules which control how often Exim tries to deliver messages that cannot be delivered at the first attempt. If there are no retry rules, Exim gives up after the first failure. The `-brt' command line option can be used to test which retry rule will be used for a given address or domain.
Retry processing applies to directing and routing as well as to delivering, except as covered in the next paragraph. The retry rules do not distinguish between these three actions, so it is not possible, for example, to specify different behaviour for failures to route the domain `snark.fict.book' and failures to deliver to the host `snark.fict.book'. I didn't think anyone would ever need this added complication, so did not implement it. Internally, however, the actual retry times for routing, directing, and transporting are maintained independently.
When a delivery is not part of a queue run (typically an immediate delivery on receipt of a message), the directors are always run for local addresses, and local deliveries are always attempted, even if retry times are set for them. This makes for better behaviour if one particular message is causing problems (for example, causing quota overflow, or provoking an error in a filter file). If such a delivery suffers a temporary failure, the retry data gets updated as normal, and subsequent delivery attempts from queue runs occur only when the retry time for the local address is reached.
Each retry rule occupies one line and consists of three parts, separated by white space: a pattern, an error name, and a list of retry parameters. The rules are searched in order until one is found whose pattern matches the failing host or address.
The pattern may be a complete address (`local_part@domain'), a plain domain, a wildcarded domain (that is, starting with an asterisk), a domain lookup (as in a domain list), or a regular expression. The first form must be used with local domains only; in this case the local part may begin with an asterisk.
After a directing or local delivery failure, regular expressions and patterns containing local parts are normally matched against the complete address (`local_part@domain'). However, if there is no local part in a pattern that is not a regular expression, then the local part of the address isn't used in the matching. Thus an entry such as
lookingglass.fict.book * F,24h,30m;
matches any address whose domain is `lookingglass.fict.book', whether this is a local or a remote domain, whereas
alice@lookingglass.fict.book * F,24h,30m;
can be used only if `lookingglass.fict.book' is a local domain. It applies to temporary failures involving the local part `alice', but not to any other local parts.
If a local delivery is being used to collect messages for onward transmission by some other means (for example, as batched SMTP), a temporary failure may not be dependent on the local part at all. Both the `appendfile' and `pipe' transports have an option called `retry_use_local_part' which can be set false in order to suppress the inclusion of local parts when matching retry patterns for those transport instances. When this option is set, patterns containing local parts are skipped, and regular expressions are matched against the domain only.
For remote domains, when looking for a retry rule after a routing attempt has failed (for example, after a DNS timeout), each line in the retry configuration is tested only against the domain in the address. However, when looking for a retry rule after a remote delivery attempt has failed (for example, a connection timeout), each line in the retry configuration is first tested against the remote host name, and then against the domain name in the address. For example, if the MX records for `a.b.c.d' are
a.b.c.d MX 5 x.y.z MX 6 p.q.r MX 7 m.n.o
and the retry rules are
p.q.r * F,24h,30m; a.b.c.d * F,4d,45m;
then failures to deliver to host `p.q.r' use the first rule to determine retry times, but for all the other hosts for the domain `a.b.c.d', the second rule is used, and that rule would also be used if routing to `a.b.c.d' suffers a temporary failure.
The second field in a retry rule is the name of a particular error, or an asterisk, which matches any error. The errors that can be tested for are:
The quota errors apply both to system-enforced quotas and to Exim's own quota mechanism in the `appendfile' transport.
The third field in a retry rule is a sequence of retry parameter sets, separated by semicolons. Each set consists of
<letter>,<cutoff time>,<arguments>
The letter identifies the algorithm for computing a new retry time; the cutoff time is the time beyond which this algorithm no longer applies, and the arguments vary the algorithm's action. The cutoff time is measured from the time that the first failure for the domain (combined with the local part if relevant) was detected, not from the time the message was received. The available algorithms are:
When computing the next retry time, the algorithm definitions are scanned in order until one whose cutoff time has not yet passed is reached. This is then used to compute a new retry time that is later than the current time. In the case of fixed interval retries, this simply means adding the interval to the current time. For geometrically increasing intervals, retry intervals are computed from the rule's parameters until one that is greater than the previous interval is found. The main configuration variable `retry_interval_max' limits the maximum interval between retries.
A single remote domain may have a number of hosts associated with it, and each host may have more than one IP address. Retry algorithms are selected on the basis of the domain name, but are applied to each IP address independently. If, for example, a host has two IP addresses and one is broken, Exim will generate retry times for it and will not try to use it until its next retry time comes. Thus the good IP address is likely to be tried first most of the time.
Retry times are hints rather than promises. Exim does not make any attempt to run deliveries exactly at the computed times. Instead, a queue-running process starts delivery processes for delayed messages periodically, and these attempt new deliveries only for those addresses that have passed their next retry time. Therefore, whatever you set in the retry rules, the minimum time between retries is the interval between queue-running processes. There is not much point in setting retry times of five minutes if your queue-runners happen only once an hour.
Here are some example retry rules suitable for use when `wonderland.fict.book' is a local domain:
alice@wonderland.fict.book quota_5d F,7d,3h wonderland.fict.book quota_5d wonderland.fict.book * F,1h,15m; G,2d,1h,2; lookingglass.fict.book * F,24h,30m; * refused_A F,2h,20m; * * F,2h,15m; G,16h,1h,1.5; F,5d,8h
The first rule sets up special handling for mail to `alice@wonderland.fict.book' when there is an over-quota error and the mailbox hasn't been read for at least 5 days. Retries continue every three hours for 7 days. The second rule handles over-quota errors for all other local parts at `wonderland.fict.book'; the absence of a local part has the same effect as supplying `*@'. As no retry algorithms are supplied, messages that fail are bounced immediately if the mailbox hasn't been read for at least 5 days.
The third rule handles all other errors at `wonderland.fict.book'; retries happen every 15 minutes for an hour, then with geometrically increasing intervals until two days have passed since a delivery first failed. The fourth rule controls retries for the domain `lookingglass.fict.book', whether it is local or remote, and the remaining two rules handle all other domains, with special action for connection refusal from hosts that were not obtained from an MX record.
The final rule in a retry configuration should always have asterisks in the first two fields so as to provide a general catch-all for any addresses that do not have their own special handling. This example tries every 15 minutes for 2 hours, then with intervals starting at one hour and increasing by a factor of 1.5 up to 16 hours, then every 8 hours up to 5 days.
Special processing happens when an address has been failing for so long that the cutoff time for the last algorithm has been reached. If this is the case for a local delivery, or for all IP addresses associated with a remote delivery, a subsequent delivery failure causes Exim to give up on the address, and a delivery error message is generated. In order to cater for new messages that may use the failing address, however, a next retry time is still computed from the final algorithm, and is used as described below.
If the delivery is a local one, one delivery attempt is always made for any subsequent messages. If it fails, the address fails immediately. The post-cutoff retry time is not used.
If the delivery is remote, there are two possibilities, controlled by the `delay_after_cutoff' option of the `smtp' transport. The option is true by default and in that case:
In other words, Exim delays retrying an IP address after the final cutoff time until a new retry time is reached, and can therefore bounce an email address without ever trying a delivery when machines have been down for a long time. This ensures that few resources are wasted in repeatedly trying to deliver to a broken destination, but if it does recover, Exim will eventually notice.
If `delay_after_cutoff' is set false, Exim behaves differently. If all IP addresses are past their final cutoff time, Exim tries to deliver to those IP addresses that have not been tried since the message arrived. If there are none, or if they all fail, the address is bounced. In other words, it does not delay when a new message arrives, but tries the expired addresses immediately, unless they have been tried since the message arrived. If there is a continuous stream of messages for the failing domains, unsetting `delay_after_cutoff' means that there will be many more attempts to deliver to failing IP addresses than when `delay_after_cutoff' is true.
An additional rule is needed to cope with cases where a host is intermittently available, or when a message has some attribute that prevents its delivery when others to the same address get through. Because some messages are successfully delivered, the `retry clock' keeps getting restarted, and so a message could remain on the queue for ever. To prevent this, if a message has been on the queue for longer than the cutoff time of any applicable retry rule, the associated email address is failed after its next temporary delivery error. A new retry time is not computed in this case, so that other messages for the same address are considered immediately.
Even with this rule a large queue of messages can take a long time to clear if some occasionally get delivered, because the intermittent failures delay delivery attempts on the others (and the above rule acts only after a delivery attempt). There is therefore an ultimate clean-up rule which causes all the remaining addresses in a message to be failed, whether or not there has just been a delivery attempt, if the message has been on the queue for longer than the longest cutoff time for any retry rule in the configuration file.
The data in the retry hints database can be inspected by using the `exim_dumpdb' or `exim_fixdb' utility programs (see chapter "Exim utilities"). The latter utility can also be used to change the data. The `exinext' utility script can be used to find out what the next retry times are for the hosts associated with a particular mail domain, and also for local deliveries that have been deferred.
Go to the first, previous, next, last section, table of contents.