Some people will call it cron jobs, scheduled tasks, or batch jobs. Whatever you call it, it’s a process that runs periodically looking at the state of a database to determine some specific action to occur for the various records it finds. If you’ve dealt with this, you probably know it can be a nightmare, especially with failures. And of course, these usually run in the middle of the night, so get ready for a page!
Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.
A while back I placed an online order at a big box store that was set for pick-up at the store, rather than delivery. Here’s the confirmation order:
Note that the email says, once my order is ready for pick-up I’ll get an email. This is because an employee at the store has to physically go to get the item off the shelf somewhere in the store, and then bring it to a holding area for pick-up orders.
Once the item is taken from the shelf by the employee at the store, they set the order as being available for pick-up, which triggers this email I received:
The interesting part of this email is that it says I have 7 days to pick up my order, otherwise I’ll be refunded and the item will be put back onto the shelf.
Basically, this is a reservation. For more on the Reservation Pattern, check out my post on Avoiding Distributed Transactions with the Reservation Pattern.
There are a couple of different scenarios. The first is that I actually drive to the store and pick up the item that I ordered.
The timeline would be that I placed my order, sometime later the order was reserved (taken from the shelf by an employee), and then later I arrived at the store to complete my reservation/order.
The second scenario is that I place my online order, it’s reserved for me, but I just never show up to pick the item up at the store.
After 7 days from the order being reserved, the reservation will expire, which means an employee will take the item and put it back on the shelf so someone else can purchase it.
If you’re using batch jobs, how would that work? Typically you’ll have a process that runs every day that will look at the state of the database and then determine what to do. In this example, this might be selecting all the orders that are reserved but have not yet been completed.
At this point, the batch job has to do all of the refunds to credit cards and update the database to set the orders as canceled.
The issue with batch jobs is that they are done in a batch. If there are 100 orders that have expired, the batch job iterates one by one performing these actions. What happens if at order 42 of 100 the process fails, because of a bug or failing to connect to the payment gateway, etc.
Batch jobs, purely based on the name or not done in isolation. This means the entire batch fails. Typically batch jobs are run during off-peak hours, which is usually in the middle of the night. If it fails in the middle of the night, you’re likely to get a page/alert of the failure.
Rather what we would want is to handle each individual expired order/reservation in isolation. What we want to do is tell the system in the future, as a reminder, to cancel the reservation/order in 7 days when it’s reserved.
If the customer doesn’t come and pick up the item at the store, then the “expire reservation” reminder will kick in and execute at the appropriate time, which is 7 days from when the order was reserved.
However, if the customer does come and pick up the item at the store which completes the order, the “expire reservation” will still be triggered, however, it won’t need to do anything since the order is complete. It can exit early and not perform any action.
Each Order will have its own “expire reservation”. Each is executed exactly 7 days from when the order was reserved and each will be executed in isolation. If one fails, it doesn’t affect the others.
So how can we technically implement this type of future reminder? You can use a queue that supports delayed delivery.
When sending a message to a queue, you’re also going to provide the queue with a period of time to delay the delivery to a consumer.
This means that the queue will not make the message visible to consumers. Consumers will not be able to pull this message from the queue until the delay period of time has elapsed. The consumers won’t even see the message in the queue.
Once the delayed delivery period has elapsed, the message will be visible again.
The consumer will then pull the message and process it.
What this allows us to do is send a message to be processed in the future.
In the example of our online order, this means that once an Order is reserved and the item is taken from the shelf, we enqueue a “expire reservation” message with delayed delivery of 7 days. After the 7 days, we will process that “expire reservation” message. If the order is already completed, the process just exits early. If it hasn’t been completed, it then does the credit card refund and updates the database to set the Order as canceled.
Avoiding Batch Jobs
You can avoid batch jobs and have smaller units of work in isolation by telling your system to do something in the future. Leveraging a queue that supports delayed delivery is one example of how you can accomplish this.
The workload will be smoothed out across time and provide isolation so you can handle failure at an individual level of work.
Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.
You also might like
- Avoiding Distributed Transactions with the Reservation Pattern
- Outbox Pattern: Reliably Save State & Publish Events
- Death to the batch job