August 12, 2019
Cold starts are one of the biggest challenges to serverless adoption. Regardless of the serverless platform, cold starts, or invocation overhead, can add seconds to the response time and create noticeable latency for users. As explained by Erwin van Eyk, Lead Developer on Fission, “a cold start is, in its essence, the worst-case time that a function execution will take.”
And despite the sometimes severe performance problems that they incur, van Eyk maintains that “cold starts are currently a fundamental characteristic of serverless computing.”
The nature of cold starts varies by FaaS provider, but there are several factors that affect all serverless platforms. Addressing these factors can help you improve the performance of your serverless applications.
When a function is invoked, the FaaS platform needs to deploy an environment (i.e. a container/sandbox) by setting up a sandbox and loading the function code, for it to run in. This process adds latency and slows down the user experience. This can have a noticeable impact on user-facing applications such as eCommerce sites, ride-hailing services, and gaming platforms.
For example, 90% of online shoppers abandoned an eCommerce site due to poor performance, and nearly 25% would not return to a slow site. The threshold is much lower for online games, where low latency is essential. In one preview of Google’s cloud-based streaming platform, Stadia, just 200ms of latency made a huge impact on playability.
Now, let’s take a look at the main causes of cold starts.
Functions calling other functions create a chain of requests, with each one dependent on the previous. This concatenates invocation overhead across the chain, and if the chain consists of cold starts, the resulting latency can be very high. In addition, while cold start time won’t add to your operating costs, you are paying for the time your functions spend waiting for responses from other functions..
Virtual Private Clouds (VPC)
VPCs improve security by creating a network perimeter around resources. However, when running a Lambda function inside a VPC, each invocation creates an Elastic Network Interface (ENI), allocates an IP address for the ENI, then attaches the ENI to the function. This process adds as much as 10 seconds to cold start times.
The big three FaaS providers (AWS Lambda, Google Cloud Functions, and Azure Functions) vary in how they handle cold starts. Each one also has its own unique benefits and challenges, which we’ll include here when relevant. We’ll include two measurements for each provider: one from Mikhail Shilkov, who tested a variety of runtimes; and one from Serverless Benchmark, which tested the Node.js runtime. These measurements were retrieved on August 8, 2019.
Lambda is one of the faster FaaS platforms, and recently improved their performance, with Shilkov measuring most runtimes between 500–800ms. The exception is C#, which can take between 800ms and 5 seconds. Instances remain warm for between 25 and 60 minutes, and instances are almost always disposed of after 65 minutes. Serverless Benchmark places Lambda’s cold start times between 220ms and 4.6 seconds, with a median of 304ms.
Using a VPC significantly increases cold start times (by as much as 17 seconds), although the Lambda team is working to improve these speeds.
Google Cloud Functions (GCF)
Serverless Benchmark, however, measures a lower cold start time for GCF Node.js functions. Cold start times measured between 50ms and 14 seconds with a median of just 188ms.
Scaling HTTP-based Functions
Responding to HTTP requests is a common use case for functions, but cold starts can greatly affect response times. For user-facing applications like websites, this added latency may drive away users. In addition, each provider handles traffic surges in different ways, with AWS Lambda scaling much more consistently than either Azure Functions or GCF. The problem is made worse when using relatively slower runtimes like .NET and Java, or when running a function behind a VPC.
Connecting to a database adds to the cold start time, since the function must wait for the connection to initialize before it can send a query. Functions can reuse the connection for future invocations, but only until the function itself is disposed of. Some DBMSes, such as Amazon Aurora Serverless Database, support querying over HTTP. This lets you send queries from functions to a database without having to manage a direct connection. In AWS, this also lets you access VPC databases without the overhead of running your function in a VPC.
No matter your FaaS platform, here are a few steps you can take to reduce your cold starts: