Deploying a Machine Learning function to AWS Lambda isn’t a simple task and the performance you get is poor. Learn why
November 17, 2019
In this blog post, I will examine the performance of machine learning functions in AWS Lambda compared to Nuweba.
In addition, I will demonstrate how to overcome the difficulties of deploying a simple Inception model to AWS Lambda in order to classify images using TensorFlow.
Imagine you want to create a simple API endpoint that receives an image (which is provided via a URL) and outputs the detected class to the client using a trained AI model. Sounds pretty useful, right?
That's what I wanted to do as well, to compare Lambda's performance to Nuweba.
I will start and say that deploying a simple function which loads an ML model should be easy and intuitive or so you would think. This isn't the case with AWS Lambda mainly due to the following Lambda deployment limits:
So before discussing performance and showing why Lambda is a really bad fit for ML functions, I want to share my journey towards deploying an ML function to AWS Lambda.
Trying to Deploy a PyTorch model to AWS Lambda
I decided to start with PyTorch, despite coming across some initial difficulties.
In my test, the deployment size was almost 1GB leading me to concentrate on shrinking the deployment package instead of concentrating on the code.
To save you time in the future, I thus present to you with the following guide:
Tricks to Reduce Your Deployed Function Size
Trick #1 - Use TensorFlow
This is not a general recommendation or my opinion, it was simply easier.
PyTorch's dependencies were too big for AWS Lambda limits, so I replaced it with TensorFlow.
TensorFlow deployment package was still too much. It required some adjustments.
Trick #2 - Downgrading TensorFlow and Python Version
What I did to combat AWS Lambda limitations was to use an old version of TensorFlow, namely version 1.0.0 with Python 2.7. This older version's wheel size is almost three times smaller than the latest TensorFlow version, greatly reducing the zip size from ~100MB to ~60MB.
Trick #3 - Compressing Dependencies
Compressing dependencies makes the unzipped package size similar to the zipped package size, as dependencies are compressed in a zip as well.
On function import, dependencies are unzipped into /tmp, which has 512MB available instead of the 250MB limit imposed by AWS Lambda on function size.
For this step to work, I had to include the following code block before all of our imports:
With this knowledge and a proper Serverless Framework configuration, you should be able to set up most functions with ease.
Now let's get back to the main topic of this blog.
To compare the performance of AWS Lambda against Nuweba's, I wrote a simple function which given an image, returns the top-1 class and score.
This function is based on this AWS tutorial, but includes a minor change.
Instead of using a S3 trigger, I bundled some images into the function zip. The function predicts a random image on each invocation using an API Gateway trigger.
This is done to (i) eliminate the network traffic overhead involved in downloading an image each time the function is being invoked and (ii) to focus on invocation overhead and function duration.
The following results have been generated using faasbenchmark, our open source FaaS comparison framework They are presented by Faastest.com - our FaaS comparison website.
You can check out the full source code in our github repository.
Let me start with cold and warm start times, or "invocation overhead" as we call it (to learn more about why we care about invocation overheads, read our blog):
You can see from these results that AWS Lambda is pretty slow when it comes to starting TensorFlow functions.
With AWS Lambda, it takes about 9 seconds compared to just 93 milliseconds with Nuweba (99th percentile) - a significant difference.
Full invocation overhead graph - level 1 (low intensity) benchmark:
And here is how the level 2 (medium intensity) invocation overhead benchmark looks like:
Now, let us move on to function duration:
We can see that function duration is pretty similar, which is what we want.
Full duration graph:
Reviewing Our Results
Deploying a Machine Learning function to AWS Lambda is not a simple task, contrary to what you would expect. More so, the performance you get from it is not satisfying, i.e, waiting 9 seconds for a function to respond to your valuable client is not ideal.
Nuweba is almost 100x times faster than AWS Lambda, as well as other leading platforms. This makes it much easier for anyone to run machine learning functions on Nuweba.