It’s been a while since I published anything. More than three years! A lot of things happened since then. The most relevant to mention in the beginning of this post is that I have been super busy building a lot of cool tech with a very talented team here at EPAM Anywhere. We are doing full-stack Typescript with next.js and native AWS serverless services and can’t get enough of it. This experience has been challenging me to learn new things every day and I have a lot to share!
Today I want to show you one particular technique that I found super useful when I need to safely use aws-sdk batch APIs and ensure delivery.
When you work with AWS, you will certainly use aws-sdk and APIs of different services. Need to send a message to an SQS queue? That’s an HTTP API call and you will use sdk. Need to update a document in DynamoDB? The same. Need to push a message to the Firehose? The same. Many of these APIs have their batch equivalents:
- SQS’s sendMessage() has sendMessageBatch()
- Dynamo’s putItem() and deleteItem() have batchWriteItem()
- Firehose’s putRecord() has putRecordBatch()
These batch APIs will throw if something fundamental is wrong. Say your auth is not good or you don’t have enough permissions or you don’t have the connectivity to the service. If the sdk connected successfully to the service but failed to perform some or all of the operations in your batch, the operation won’t throw. It will return an object that tells you which operations succeeded and which ones failed. The most likely reason to get partial failures is due to throttling. All of these APIs have soft and hard limits and sooner or later you will attempt to do more than AWS feels comfortable letting you get away with.
We learned it the hard way. It’s all documented, obviously, but things like this one are only obvious in hindsight. Let me show you a neat technique to batch safely but first, some background.
I always liked recursion. When you need to scroll through something that paginates, you can employ while loops or you can recursively repeat and accumulate results as you go. The recursion always felt much cleaner to me but it comes with a gotcha — no stack is infinite. Consider the following simple example:
This snippet won’t print 100000. When I run it with node sample.js, I get 15707 printed in the console. Your mileage may vary but you know you can reach the deep end and go no further. The error that I am not reporting is Maximum call stack size exceeded.
What if op() was performing a network operation? Let’s simulate it and convert op() to an async op():
It prints 100000 and we do not exhaust the stack. Let’s understand why and we will be well on our way to leveraging this technique in real-world scenarios.
The trick is in how promises (and async functions that return them) use event loop to schedule continuations. I highly recommend this article to get a deeper insight into how it all works. And here’s specifically about promises.
Basically, Promises use micro tasks just like process.nextTick() does and since the callback runs via the event loop, the stack frame is short-lived and every recursive invocation has its own.
Let me do the same but this time I will be more explicit:
It also prints 100000 but here you can see how I “delay” promise resolution via the callback scheduled on the event loop. It adds one more ingredient that I need to explain.
I am using a trick of promise nesting when I do resolve(op()). When a promise A resolves with a promise B, the result of A is the resolved value of B. Since my op() keeps recursing onto itself, the last promise’s resolved value will be the value returned by the first call to the op().
Async recursion with backoff
The last thing that I want to illustrate before I show you how I use this technique with aws-sdk APIs is a recursion with a backoff strategy. Take a look:
It prints a value somewhere around 50. The code goes through 10 executions of op() and delays each next run by iteration milliseconds. So +1, then +2, then +3, up to +9 for the last run. We stop when ++iteration is equal to 10 so we only run through 9 via setTimeout(). The sum of the arithmetic progression from 1 to 9 with a step of 1 is 45 but op() doesn’t run exactly at the interval we ask for plus performance.now() isn’t exactly 0ms so let’s call the difference an overhead.
AWS batch APIs with retries
We are now ready to put it all together and employ async recursion with backoff technique with the batch APIs to ensure delivery.
First, the backoff strategies:
And then somewhere else in the code:
I have to say that we are not using this technique in our request/response APIs. We are pretty serious about building fast and pleasant experiences and so we target sub-half-second for user-facing APIs. We use this technique anywhere else though — async step functions, batch operations, code that is responding to external events.
That’s it for today. I hope you found it useful. More to come soon!
Interested in taking your engineering career further and solving exciting challenges like this one above? We have hundreds of open remote jobs available — explore, apply, and become part of the EPAM Anywhere global team.