Chapter 3: Deploying your first ML model prediction service
In this chapter, we’ll guide you through deploying your first ML model using MAESTRO's AWS SageMaker platform.
AWS SageMaker is a fully managed service that allows data scientists to build, train, and deploy machine learning models efficiently. It simplifies the typical challenges of machine learning by handling tasks like infrastructure management and scaling, so you can focus on model development and deployment.
Setting up the MAESTRO environment is the simplest approach for this purpose. However, you can also create your own environment if needed, and the instructions in this chapter can be adapted for your own SageMaker environment. For additional guidance on setting up your own environment, please refer to our MLOps Starter Kit at https://sgts.gitlab-dedicated.com/wog/gvt/dsaidquantitativ/qs-central/mlops-starter/mlops-infra.
The case study that we will be focusing on will be on predicting the resale prices of Housing & Development Board (HDB) apartments. The overarching goal is to build a regression model that leverages publicly-accessible and real-world data from HDB’s resale transactions.
Our assumption here is that you have little or no knowledge of deploying an endpoint. If you have some experience, do move straight to Chapter 4 for a fuller case study on an end to end MLOps workflow.
3.1. Prerequisites
Before we begin, ensure you have the following:
- You can access a project domain in MAESTRO SageMaker
- You are granted the
ML Engineerrole for this project domain - You can clone this repo with the code for this walk through. In the repo you'll find:
data/resale_prices_Mar2023_to_Feb2024.csv: a CSV file containing information on HDB resale prices1_inspect_data.ipynb: a Jupyter notebook performing basic EDA on the data2_train_and_deploy.ipynb: a Jupyter notebook which uploads the data to S3, starts a SageMaker training job, and deploys the trained model to a SageMaker Endpointtrain.py: a Python file defining the inference and training code for the model
For further details on the roles in MAESTRO SageMaker, please refer to the MAESTRO SageMaker User Guide.
3.2. Overview of the Deployment Process
Deploying a machine learning model in SageMaker involves several key steps:
-
Prepare the Training Script: This script trains the model and defines necessary functions for SageMaker.
-
Upload Data to S3: SageMaker reads training data from Amazon S3.
-
Create and Train a SageMaker Estimator: This step involves specifying the training script and training the model.
-
Deploy the Model: Once trained, the model is deployed to an endpoint for real-time predictions.
Context
This will be a toy example focusing on a regression model that predicts HDB resale prices. Traditionally, a Data Scientist may perform EDA on the data, or even train and generate predictions locally. This corresponds to Stage 0 MLOps, and the 1_inspect_data.ipynb notebook.
Preparing the Model Script (train.py)
For Stage 1 MLOps, let's start with the training script. There are two parts of this script: (a) the model training code, and (b) the four essential functions used at inference time.
3.3. Model training code
The code below is executed when train.py gets called in the training step. It reads the training data, performs feature engineering, trains the model, and saves it to disk.
For this walkthough, we will use a simple Random Forest Regressor model. You can replace this with any other model you prefer, or perform more complex feature engineering and hyperparameter tuning.
joblib is used to save the trained model to disk. This is because SageMaker expects the model to be saved to disk, and joblib is a popular library for serializing Python objects. pickle is another option, but joblib is preferred for large NumPy arrays.
if __name__ == '__main__':
# This block ensures that the script is being run directly and not imported as a module
parser = argparse.ArgumentParser()
# Parse arguments passed to the script
parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
args = parser.parse_args()
# Load training data from S3
df = pd.read_csv(os.path.join(args.train, 'resale_prices_Mar2023_to_Feb2024.csv'))
# Feature engineering
# insert feature engineering code
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = RandomForestRegressor()
model.fit(X_train, y_train)
# Save the trained model
joblib.dump(model, os.path.join(args.model_dir, 'model.joblib'))
3.4. Inference code functions
For inference, four essential functions are required: model_fn, input_fn, predict_fn, and output_fn. If these are not defined in the model script, SageMaker's default implementation of these functions will be used instead.
For this example, we overwrite the function definition for three out of four of these essential functions.
We will now explain each of these functions in detail:
- model_fn: This function loads the trained model. When SageMaker hosts the model, it needs to load the model from the saved state, and this function facilitates that.
model_diris where the model artifacts are stored, andjoblib.loadloads the model from the specified directory.
- input_fn: This is the first function that SageMaker would call when it receives an API call. Since API payloads are typically JSON, this function converts JSON input to a pandas DataFrame that the model can work with. The output
Xis passed topredict_fnasinput_data.
def input_fn(request_body, request_content_type):
features = json.loads(request_body)
X = pd.DataFrame(features, index=[0])
feature_town = features["town"]
X = X.drop(['town'], axis = 1)
for town in TOWNS:
if town == feature_town:
X.loc[:, town] = 1
else:
X.loc[:, town] = 0
return X
- predict_fn: This function takes in
input_datagenerates predictions using the trained model. The return value is passed tooutput_fnasprediction.
- output_fn: Formats the prediction output. This is not included in this example, but useful for custom output formats. For example, you may want to return a JSON response with additional metadata. If you wish to override the default implementation, the function signature for
output_fnisoutput_fn(prediction, accept).
3.5. Uploading the training data to S3
For our example, before you can train the model, you will need to upload the training data to an S3 bucket. This is a necessary step because SageMaker reads training data from S3. In practice, however, the Data Engineering Team from Agency or Vendor should already have a Data Storage solution, which may require some integration.
Here we provide a simple script to upload the training data to S3 inside 2_train_and_deploy.ipynb.
import sagemaker
sagemaker_session = sagemaker.Session()
bucket = 'your-s3-bucket-name'
prefix = 'demo-housing-price-prediction'
train_file = "data/resale_prices_Mar2023_to_Feb2024.csv"
train_uri = sagemaker_session.upload_data(path=train_file, bucket=bucket, key_prefix=prefix)
3.6. Preparing the Training and Deployment Notebook (2_train_and_deploy.ipynb)
In addition to the data upload code above, the notebook includes code that defines the SageMaker Estimator, trains the model, and deploys it to an endpoint.
Designing the SageMaker Estimator and starting the training job
We now define a SageMaker Estimator, specifying the train.py script as the entry_point and necessary parameters. SageMaker's SKLearn is used instead of the plain scikit-learn because it integrates seamlessly with AWS infrastructure, providing managed training and deployment services.
from sagemaker.sklearn.estimator import SKLearn
# Define the SKLearn estimator
sklearn_estimator = SKLearn(
entry_point='train.py', # Path to the training script
role=sagemaker.get_execution_role(), # AWS IAM role
instance_type='ml.m5.large', # Type of instance to use for training
framework_version='0.23-1', # Version of the scikit-learn framework
py_version='py3', # Python version to use
sagemaker_session=sagemaker_session # SageMaker session
)
We can then start the training job by calling the fit method on the estimator.
This will start the training job, and the progress will be displayed in the notebook.
Deployment
Once the model has been trained, we deploy it to an endpoint for real-time predictions. An endpoint is a web service that hosts the trained model and makes it available for real-time inference.
predictor = sklearn_estimator.deploy(
endpoint_name=f"demo-price-prediction-{YOUR_NAME}",
instance_type='ml.m5.large'
)
For the instance_type, you can choose the instance type that best suits your use case. The ml.m5.large instance type is a good starting point for most use cases. Please refer to the MAESTRO SageMaker User Guide for more information on available instance types.
Testing the Endpoint
Once the endpoint is deployed, you can test it within the notebook by sending a POST request to the endpoint with the input data. The input data should be in JSON format.
import json
# Sample input data
input_data = {
"town": "ANG MO KIO",
"flat_type": "3 ROOM",
"floor_area_sqm": 60,
"lease_commence_date": 1980,
"storey_range": "10 TO 12",
"remaining_lease": 60,
"resale_price": 300000
}
# Send a POST request to the endpoint
response = predictor.predict(json.dumps(input_data))
# Print the prediction
print(response)
You may also test the endpoint within SageMaker's graphical interface. Navigate to the Endpoints section in the SageMaker console, select the endpoint, and click on Test.
3.7. What's next?
Congratulations on your backend deployment!
In this walkthrough, we focused on a simple regression model. You can extend this example in several ways:
- Hyperparameter tuning: Use SageMaker's hyperparameter tuning feature to find the best hyperparameters for your model.
- Feature engineering: Perform more complex feature engineering to improve model performance.
- Model selection: Experiment with different models to find the best one for your use case.
- API Gateway: Integrate the SageMaker endpoint with API Gateway so that it can be called within the GEN network.
If you are interested in how this prediction service can translate to a frontend, consider using CStack. In one of our add-ons (see Chapter 9B), we provide a simple example of creating a frontend environment using Streamlit and CStack (Container Stack).
Otherwise, let's move on to the next few chapters, which will provide a step-by-step approach towards Stage 2 MLOps.