June 16th 2020
CEO of Dashbird. 13y experience as a software developer & 5y of building Serverless applications.
AWS Lambda has a cool feature that can be both a blessing and a nightmare for a serverless application, depending on whether it’s properly handled by our code: the retry behavior.
A retry occurs when an invocation of a Lambda function results in an error and the AWS Lambda platform automatically invokes the function again, with the same event payload.
Let’s say you’re operating an e-commerce site and AWS Lambda is being used to process customer orders. A person purchases an item and you have a function taking care of the following steps, all in a single run:
Making sure the item is available in stockProcessing credit cardRemoving item from stockSending confirmation email
Now consider the first three steps completed successfully, but there was a momentaneous issue in sending the email and your application raised an error. Lambda platform automatically invokes the function again, with the same parameters, and the email is sent successfully. Awesome, isn’t it?
Well, not so fast. Our system just registered a second, unintended purchase for the same customer… and charged his credit card twice!
Houston… we have a problem!
Seldom this process would be implemented exactly like this, but it serves as an illustrative example.
Why on earth would AWS do this to me?
Lambda retry behavior is actually a very cool feature, don’t get it wrong. In a distributed system, many things can go wrong. In fact, when things can go wrong, rest assured they will go wrong at some point. AWS takes care of making sure these errors aren’t left buried and the operation has a few more chances to succeed. We surely don’t want to miss the revenue of a sale due to a technical issue.
All right, we see value in the retry behavior, but how can we avoid the headaches such as the double charge example?
Read operations usually do not produce any side effects, they’re idempotent by nature. In our example, operation #1 (check if an item is available in stock) would be an example of that. In most cases, you won’t need to worry about these, so having them implemented separately will make it easier to manage the rest of your stack.
Storing and deleting a value aren’t idempotent operations by nature, but they can be if we have a unique identifier (UID) for that resource. In our e-commerce scenario, if the customer order has a UID, the storing operation can be performed multiple times without creating multiple different order placements.
The order UID could be, for instance, a hash of the customer email or username, the purchase timestamp, and a list of items purchased. These variables would be sent as a parameter to our API when the site receives the order request. If the function fails at some point and the invocation is retried, the same order UID would be generated again, meeting the idempotency requirement. Again, this is just for illustration purposes — each circumstance will require proper analysis to find a stable and resilient idempotent implementation.
Usually, if the operation takes place in the realm of your stack, it will be fully on your hands to meet idempotency requirements. The unique identifier principle explained above will usually be enough. But if you’re relying on third-party APIs, it might be tricky to ensure idempotency and you might need help from the other party to accomplish this goal, in case this kind of operation isn’t supported out of the box. If you can’t get the third party to work with you, there’s always the possibility to run all operations on your end first, create a separate process to check whether everything ran successfully, then interact with the external API. This wouldn’t be the ideal implementation but could be as good as one can get in some circumstances.