April 6th 2020
Senior Software Engineer at AMPLYFI: a scale-up developing AI-powered business intelligence tools.
What’s the Point?
The development of Firecracker was undertaken to meet several objectives. These were:
- To run thousands of functions (up to 8000) on a single machine with minimal wasted resources.
- To allow thousands of functions to run on the same hardware, protected against a variety of risks including security vulnerabilities, such as side-channel attacks like Spectre.
- To perform similarly to running natively, with no impact from the consumption of resources by other functions, retaining the possibility of over committing resources while providing functions with only the resources it needs.
- To be able to start new and clean up old functions quickly.
So How Does It Work?
The invoke traffic gets delivered via the Invoke REST API, which authenticates requests, checks for authorization and then loads the function metadata.
The requests are then handled by the Worker Manager, which sticky-routes to as few workers as possible to improve cache locality, enable connection re-use and amortize the cost of moving and loading customer code. Once the Worker Manager has identified which worker should run the code, it advises the Invoke service, cutting down on round-trips by having it send the payload directly to the worker.
Each worker potentially offers thousands of MicroVMs, each providing a single slot and Firecracker process, with each slot only ever used for a single concurrent invocation of a function, but many serial invocations. Each slot supplies a pre-loaded execution environment for a function, including a minimized Linux kernel, userland and a shim control process.
This method is like that offered by QEMU, Graphene, gVisor and Drawbridge (and by extension, Bascule) in that they provide some of the operating system functionality within the userspace to reduce the kernel surface and so improve security. On serial invocations, the MicroVM and the process the function runs in are re-used.
If a slot is available, the Worker Manager performs a lightweight concurrency control protocol and informs the front-end that the slot is available for utilization. The front-end then calls the MicroManager with the details of the slot and payload, which is then passed onto the shim running inside the MicroVM for that slot.
The MicroManager keeps a small pool of pre-booted MicroVMs ready to be used, as the already fast 125ms boot-up time offered by Firecracker is still not fast enough for the scale-up path of Lambda. Upon completion, the MicroManager gets given either a response payload, or the details of an error which are then returned to the front-end.
However, if no slots are available, the Worker Manager calls the Placement service to request that a new slot gets created for the function. This service then optimizes the process (taking less than 20ms on average), ensuring that the use of resources such as CPU is even across the fleet, before requesting that a particular worker generates a new slot.
To reduce blocking of user requests, the MicroManager keeps a small pool of pre-booted MicroVMs ready to be used when requested by the Placement service.
For each MicroVM, the Firecracker process handles creating and managing the MicroVM, providing device emulation and handling VM exits.
The shim process communicates through the MicroVM boundary using a TCP/IP socket with the MicroManager — a process that manages a single worker’s Firecracker processes. The MicroManager provides slot management and locking APIs to the Placement service and an invoke API to the front-end.
As an extra level of security against unwanted behaviour (including code injection), a jailer implements a wrapper around Firecracker which puts it in a restrictive sandbox before booting the guest.
Previously published at https://medium.com/@KerlDev/a-deep-dive-into-aws-firecracker-b21fb41c19d0