Message Queue Overload from High Processing Latency

Sponsor: Using RabbitMQ or Azure Service Bus in your .NET systems? Well, you could just use their SDKs and roll your own serialization, routing, outbox, retries, and telemetry. I mean, seriously, how hard could it be?

Message Queue Overload can occur when consumers cannot keep up with the work being created by producers. This can happen unexpectedly when processing latency increases dramatically. Here’s one spot to look out for when using network calls such as HTTP when processing messages. Without handling them correctly with timeouts, you can increase processing latency which will overload a message queue.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

https://www.youtube.com/watch?v=gQc3WcqqcgM

Message Queue Overload! Avoiding Processing Latency (https://www.youtube.com/watch?v=gQc3WcqqcgM)

Message Queue Overload

What exactly do I mean by overloading a message queue? Simply that consumers cannot process messages fast enough to keep the # of messages in the queue relatively low or none. In other words, when a message is produced the consumers are so busy that the message must sit and wait before a consumer finally is able to consume it.

As an example, let’s say you have 10 consumers that are each able to process 1 message at a time. Each message takes 200ms to process. This means that you can process 50 messages per second.

If you start producing more than 50 messages per second, you’ve overloaded your queue. You’re basically like a boat taking on water faster than you can take the water out.

This goes without saying that most systems are linear. Meaning they aren’t having to consume the same # of messages at the same rate constantly. There are often peaks and valleys that allow the system to catch up. A lot of this also depends on what kind of overall latency is acceptable.

Solutions

There are two primary solutions for handling this. The first and most obvious is to increase the number of consumers. The more consumers you add, the more messages you can process.

In my example above, if you’re producing 55 messages per second then you must add one more consumer to a total of 11.

Processing Latency

The second solution is to reduce processing latency. Instead of having each message take 200ms, we optimize the message processing to take 100ms per message. With our 10 consumers, we can now process 100 messages per second.

The opposite can also occur where the processing latency actually increases. Instead of messages taking 200ms, they all of a sudden take longer. If for example a message starts taking 500ms to process, you’re throughput will now be a total of 20 messages per second. This will overload your queue.

Network I/O

The most common reason I’ve noticed processing latency increase is because of network I/O.

When processing a message, let’s say you must make an HTTP call to an external service. If this normally takes on average 100ms, but suddenly takes longer, then the overall processing latency will increase.

This can happen with any network call, but I’ve found this most often to occur with services that are out of your control/ownership.

In .NET, the HttpClient default timeout is an absurd 100 seconds.

If a service you were calling when processing a message, all of a sudden took more than 100 seconds to respond, it would take that long for HttpClient to throw an exception because of the Timeout. This would overload your queue very very quickly.

Adding more consumers would not likely solve your problem at a 100-second timeout.

Timeouts

The moral of the story is to be very aware of network calls you’re making when processing messages. Understand what time of latency is acceptable and what cannot be exceeded.

Add timeouts to any HttpClient calls or if you’re using a library that sits on top of a message broker, use built-in timeouts around the entire processing a message.

Follow @CodeOpinion