Introduction To Amazon SageMaker | Hacker Noon

Author profile picture

@samadritaghoshSamadrita Ghosh

Technical Writer | Co-Author of Data Science for Enterprises | Mentor @upGrad

Amazon AI/ML Stack

Image Courtesy: AWS Docs

Amazon’s Web Services have a series of optimized services specifically tailored for Artificial Intelligence and Machine Learning Algorithms. These fit into three major tiers, as follows:

  • Application Services — These are domain-based services which allow us to very quickly generate predictions with pre-trained models using simple API calls.
  • Platform Services — Unlike Application services, platform services allow us to build our customized Machine Learning and AI solutions through optimized and scalable options. The SageMaker service that we will discuss in this article, falls in this tier.
  • Frameworks and Hardware — The tiers mentioned above run on top of the frameworks and hardware tier. This layer provides a wide range of optimized deep learning tools like TensorFlow, Keras, Pytorch and Apache MXNet. Options of compute options (GPU, CPU) are also available.

Amazon SageMaker

Now that we know where Amazon’s SageMaker Service falls, lets delve a bit deeper into it.

A generic Machine Learning Pipeline has the following primary modules:

  • Data Extraction
  • Data Processing
  • Data Analysis
  • Feature Engineering
  • Model Training and Tuning
  • Prediction Generation
  • Deployment to End-User

SageMaker combines these modules and works with three major components:

  • Build — Involves data extraction from S3, Docker or any other storage option used. Processing and feature engineering follow.
  • Train — This component combines model tuning and model training.
  • Deploy — This component allows us to deploy the predictions and save them to the preferred storage location.

These components are independent of each other and can be used separately or even in required combinations.

The management of SageMaker components is extremely easy through Amazon SageMaker Console which has a very clean layout, making options for the different components easily accessible and configurable.


The build phase initiates the first interaction of Data with the Pipeline. The easiest way to do this is to generate SageMaker’s Notebook Instances. This not only enables the integration of the required code but also facilitates clear documentation and visualization.

The other options available for code integration is Spark SDK which enables integration of Spark pipeline through AWS’s Elastic Map Reduce or EMR service.


Setting up the training module in SageMaker is extremely easy and feasible. The primary attractions of the training component in SageMaker are as follows:

  • Minimal Setup requirements — on creating a simple training job, SageMaker takes care of the hardware requirements and backend jobs like fetching storage and instances.
  • Dataset Management — SageMaker takes care of streaming data and also helps manage distributed computing facilities which can help increase the speed of training.
  • Containerization — All models in SageMaker, whether it is an in-built model like XGBoost or K-Means Cluster, or a custom model integrated by the user, are stored in Docker containers. SageMaker efficiently manages and integrates the containers without any external aid from users.


There are several deployment options in SageMaker. With SageMaker’s UI, it is a one-step deployment process, providing high reliability with respect to quality, scalability and high throughput facilities.

Several models can be deployed using the same end-point (the point of deployment) so that the model can go through A/B testing which is supported by SageMaker.

One major advantage of the deployment facility is that SageMaker allows upgrades, updates and other modifications with zero downtime, owing to blue-green deployment (when two similar production environments are live such that if one goes down, the other one keeps the server up and running).

Batch predictions, which are often required in production, can also be carried out using SageMaker with specific instances which would stream data in from and out f S3 and distribute the tasks among GPU or CPU instances (as per the configuration).


With this, we have come to the end of Amazon SageMaker basic concepts. Watch this section for a DEMO on how to get started with SageMaker, which will be published soon.

For any questions or suggestions, you can drop a mail at [email protected]

Or DM me on LinkedIn

Looking forward to connecting with you and your ideas!

Previously published at

Author profile picture

Read my stories

Technical Writer | Co-Author of Data Science for Enterprises | Mentor @upGrad


The Noonification banner

Subscribe to get your daily round-up of top tech stories!

read original article here