You may experience issues resulting in mail failing to be delivered to internal users. This may be difficult to detect using the common tactics for Exchange management.
In the situation experienced, an internal user had sent an email over 1.5GB in size. Normally this wouldn’t be a problem but due to a mis-configuration the internal receive connectors were set to accept messages up to 2047MB, which is also the maximum message size limit. This resulted in the exchange service attempting to receive the message rather than generating an NDR response, which would have stopped the message delivery from being retried.
This resulted in the Transport service being put under pressure and the incoming queue being throttled by back pressure, with no messages being accepted or delivered. If left alone the problem would not auto-resolve unlike a back pressure issue caused by legitimate email load.
Exchange Mail Flow
To explain how this happened we need to understand how the transport service works for delivering all email. Microsoft provide the following mail flow diagram to show how this works
No matter where an email originates it needs to traverse the transport service to be delivered. External messages are typically what an admin deals with. These come into the front-end service as an SMTP message. They are checked to see whether they are authorised before being transferred to the transport service and then forwarded into the mailbox delivery service for delivery.
Messages sent by internal users will also transit the transport service but these take a slightly different route. The client does not send the message via SMTP but rather puts the message into the outbox folder for the mailbox. From here the message is sent by the store driver submit process which submits the message to the transport service. This then processes the message and sends it back to the store driver to deliver the message
An added complication to this process is back pressure detection.
Exchange Server detects pressure on the transport service and starts to reduce the speed that messages are accepted to ensure that the server remains operational. Even though emails may be delivered at a slower rate the server remains operational in this situation.
This back pressure is detected based on several metrics. The easiest way to see the current state of these metrics is to run the powershell command:
[xml]$bp=Get-ExchangeDiagnosticInfo [-Server <enter-Exchange-server-name-here> ] -Process EdgeTransport -Component ResourceThrottling; $bp.Diagnostics.Components.ResourceThrottling.ResourceTracker.ResourceMeter
Most of these counters are fairly self explanatory and relate to free disk and memory on the server but one that may be new to you is UsedVersionBuckets. This is the number of uncommitted message queue database transactions in memory. So what does this mean?
When a message is being received the exchange server will be receiving it into memory. Once the complete message is received then it can save it into the mail queue database. While the message is still in memory though the UsedVersionBuckets will increase. This can happen either when the server is receiving many small messages or a small number of very big messages.
How large messages impact the transport service
In this case a single very large message was causing this pressure. Every time the message was submitted by the Store Driver it would result in the UsedVersionBuckets soaring to over 3000. At this point the transport service stopped processing any messages into the mailbox submission queue and stalled. While the service could be restarted once the message was resubmitted the same behaviour was repeated.
Typically advise includes looking at the message queues to see how many messages are being received and how big they are, but in this case the queues could even be clear. The message hasn’t made it to the queue yet as it’s still being transferred by the store driver.
In order to find this message you will need to perform a system wide search of the outbox folder of all mailboxes using the powershell command
Get-mailbox -Resultsize Unlimited | get-mailboxfolderstatistics -folderscope outbox | fl |ft identity,ItemsinFolder,FolderSize
Once the offending message is found it needs to be removed from the client side to make sure that it doesn’t get resubmitted to the store driver. There is no way to do this from the server side.