Serverless “Not Hotdog”

Ever since the “Not Hotdog” app featured on HBO’s Silicon Valley, and Tim Anglade’s comprehensive article on how they built this with TensorFlow, I felt challenged to reproduce this with less code and a serverless computing model.

Now, thanks to the fantastic course from and the release of Lambda Layers at re:Invent last year, I have finally managed to train a model (with an accuracy of 93% on this Kaggle dataset) and deploy it to a Lambda function using a PyTorch layer and the serverless framework. What surprised me most is that it took me less than 10 minutes to train a decent enough classifier using transfer learning, and about 1–2 seconds to make a prediction.

I am convinced that the mix of AWS Sagemaker for model training with AWS Lambda for model inference is a winning combination — see also here and here on why I generally think serverless is becoming the primary computing paradigm. The beauty of the serverless model for machine learning is that you can do both training and inference and spend just a couple of dollars. Furthermore, running inference of Lambda means that you can put such a model into production without worrying about cost nor scaling (AWS claims that 90% of GPU costs are spent on inference).

In this post, I’ll walk you through model training (using, Pytorch and a GPU-enabled Jupyter notebook instance on AWS Sagemaker), deploying our model to S3 and Lambda, and finally running this at scale in production.

Model training

Example data from the Kaggle dataset.

Model training has the best performance on GPU, and AWS Sagemaker makes it easy to set up a Jupyter notebook. At $1.26/hour, a ml.p2.large instance is a very cost-efficient way to get started with model training without having to spend big bucks.

I began by building a model based on resnet50, a highly performant model trained on ImageNet. I use Kaggle’s hotdog or no hotdog dataset which contains a set of hot-dog and no-hot-dog (other food types) images. The data is split between a training set and a test set, with 500 images each (250 images of hot dogs, 250 images of other food types).

data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), size=400).normalize(imagenet_stats)

Using transfer learning — i.e. reusing a pre-trained model — is a very successful technique in image recognition, and makes it dead easy to implement:

learn = cnn_learner(data, models.resnet50, metrics=accuracy)

At this point, we are ready to start the actual training of the model, and I followed the classic recipe of initially training the last layer in the model, finding optimal learning rates, unfreezing the model, and retraining all previous layers of the model.

# Find the optimal learning rates for training previous layers
# Unfreeze our model and continue training
learn.fit_one_cycle(20, max_lr=slice(1e-6,1e-4))

After training, I reached an accuracy of 93%, as pictured in the confusion matrix below. Only 35 out of 500 test images were incorrectly classified, and it’s harder to classify food as hotdog rather than the opposite: 25 pictures are mistakenly identified as a hotdog while only ten hotdog predictions aren’t hotdogs.

Confusion matrix for our predictor.

The key to obtaining reasonable accuracy is to try different things. In my experimentation, I changed the network architecture from resnet35 to resnet50, tried different learning rates, and changed the size of the input images (I found that 400×400 pixels worked better than the traditional 224×224 pixels). While state-of-the-art for this particular problem space is probably around 98% accuracy, I am more than happy to reach 93% accuracy, especially considering that my training set is rather small (remember: we have only 500 images for each category). On this set, other people have gotten to 87% and 92%.

Moreover, I didn’t want to spend hours waiting for my model to train. Instead, with handy default settings and intuitive top-down learning approach, and using an existing image classifier network architecture, I knew I could reach good accuracy in just a couple of minutes. In fact, recently managed to set the record of training ImageNet in only 18 minutes to 93% accuracy. These guys are on a serious mission to change the way we do machine learning forever.

Exporting our classifier to S3

To use our trained model for inference — i.e. for actual predictions on new images — we need to export and store our model in the cloud.

PyTorch can easily export your model into TorchScript. What you need to know here is that whenever you have a model in TorchScript, you can use this model without needing a Python environment. Pretty neat, no?

save_texts(path/'models/classes.txt', data.classes)
trace_input = torch.ones(1,3,img_size,img_size).cuda()
jit_model = torch.jit.trace(learn.model.float(), trace_input)
output_path = str(path/f'models/{model_file}'), output_path)

Our final TorchScript model is 103 MB, and we now need to zip this up together with our classes file (hot dog, no hot dog).

with, 'w:gz') as f:
f.add(path/f'models/{model_file}', arcname=model_file)
f.add(path/f'models/{classes_file}', arcname=classes_file)

Finally, we want to upload the model to an S3 bucket fastai-modelso our Lambda function can use it in the next step.

s3 = boto3.resource('s3')
s3.meta.client.upload_file(str(tar_file), 'fastai-model', 'hotdog/model.tar.gz')

Deploying to AWS Lambda

Once we’ve got our model in S3, the next step is to have the predictor run on Lambda, so we can make predictions without having to pay for infrastructure left idle.

I use the excellent guide on Deploying on AWS Lambda, but instead of using AWS’s SAM I adopted the code to work with the node.js Serverless framework, which is a long-time favourite of mine when it comes to managing serverless workloads.

The relevant bit of the serverless.yml is the function declaration is

MODEL_BUCKET: fastai-model
MODEL_KEY: hotdog/model.tar.gz
handler: handler.lambda_handler
- arn:aws:lambda:us-east-1:934676248949:layer:pytorchv1-py36:1
- http:
path: invoke
method: post

We use the magic of Lambda layers to pull all libraries required to run our model inference. Before the release of Lambda layers at re:Invent 2018, you’d have to package up all your dependencies into a ZIP file, make sure it’s compatible with Lambda’s AWS Linux environment, all the while keeping the total package size less than 250 MB. My previous attempts at this were a rather frustrating experience indeed. (

Now, you can include publicly available layers (thanks to Matt McClean for sorting out a set of Pytorch layers), deploy and run. No need to mess around with Docker locally.

I then copied over the code from the SAM example project which takes an image URL, downloads our model from S3, runs Pytorch’s model inference and returns a prediction with an associated probability. To deploy this, we run:

serverless deploy

which returns an endpoint

Have a look at the codebase here.

Inference in production

The most exciting bit is to see how our model performs in production, on images it has never seen before.

Let’s start with a simple hotdog image:

curl -d "{"url":""}" -H "Content-Type: application/json" -X POST
{"class": "hot_dog", "confidence": 0.9999924898147583}

As expected, our classifier predicts this to be a hot_dog with 99.99% confidence.

Let’s see if it can distinguish a hotdog from Mexican fajitas:

curl -d "{"url":""}" -H "Content-Type: application/json" -X POST
{"class": "not_hot_dog", "confidence": 0.8462255001068115}

Nice! We’re reasonably confident that that wasn’t a hotdog.

If you feel you’re ready to start fooling this predictor (admittedly a lot of fun), it’s worth remembering how we trained this classifier. We only had pictures of hot dogs and other no hot dog food types as part of our learning phase. That means if you present it with something that looks like a hotdog but is not it is undoubtedly going to fail spectacularly. This hot dog phone case is arguably a hot dog though, and our predictor classifies it as such with 99% confidence. Now, to make a better classifier, you need to go back and enrich your dataset with these types of images and re-train your model.

Predicting at scale

One of the coolest features of Lambda is that scaling is something you don’t have to worry about — at least not with our simple predictor which doesn’t have downstream dependencies which could be your bottleneck.

To test this, let’s run our predictor on a set of 1000 images randomly sampled from the ImageNet database. Note that I don’t particularly care about having high accuracy in predicting hotdogs, but only about the speed at which I can make these predictions in what mimics an actual production use case.

Using node library, it’s easy to simulate ~900 or so requests in less than a minute (check out the code if you’re interested).

All virtual users finished
Summary report @ 21:48:05(+0100) 2019-04-27
Scenarios launched: 904
Scenarios completed: 904
Requests completed: 904
RPS sent: 14.63
Request latency:
min: 371.2
max: 3139.4
median: 1357.9
p95: 1950.6
p99: 2355.5
Scenario counts:
0: 904 (100%)
200: 904

You can see that for 95% of the requests our predictor returned a result in less than two seconds (the p95, or the 95th percentile) — which including passing the image to our predictor. Not bad!

Thoughts on Serverless machine learning

The original classifier was architected to run inference on a mobile device. Therefore much work went into making this fit that particular computing environment. For real-world scenarios, having a REST endpoint available for making POST requests which return a classification result is more than sufficient. Equally, training your models in a GPU-enabled cloud environment such as Sagemaker makes this pleasant and easy to get started.

My preferred method to deploy any code to the cloud involves AWS Lambda. I am a firm believer that the serverless paradigm will rapidly take over and become the primary programming model for cloud-native applications. Therefore, I was pleased to get this last bit working without much effort — and I believe serverless inference will become the primary mechanism of running predictions at scale. It also clearly shows you don’t need the power of a GPU for anything other than training your model. A simple Lambda with 1Gig of RAM does the trick.

read original article here