Chapter 3: Deploying your first ML model prediction service

In this chapter, we’ll guide you through deploying your first ML model using MAESTRO's AWS SageMaker platform.

AWS SageMaker is a fully managed service that allows data scientists to build, train, and deploy machine learning models efficiently. It simplifies the typical challenges of machine learning by handling tasks like infrastructure management and scaling, so you can focus on model development and deployment.

Setting up the MAESTRO environment is the simplest approach for this purpose. However, you can also create your own environment if needed, and the instructions in this chapter can be adapted for your own SageMaker environment. For additional guidance on setting up your own environment, please refer to our MLOps Starter Kit at https://sgts.gitlab-dedicated.com/wog/gvt/dsaidquantitativ/qs-central/mlops-starter/mlops-infra.

The case study that we will be focusing on will be on predicting the resale prices of Housing & Development Board (HDB) apartments. The overarching goal is to build a regression model that leverages publicly-accessible and real-world data from HDB’s resale transactions.

Our assumption here is that you have little or no knowledge of deploying an endpoint. If you have some experience, do move straight to Chapter 4 for a fuller case study on an end to end MLOps workflow.

3.1. Prerequisites

Before we begin, ensure you have the following:

You can access a project domain in MAESTRO SageMaker
You are granted the ML Engineer role for this project domain
You can clone this repo with the code for this walk through. In the repo you'll find:
- data/resale_prices_Mar2023_to_Feb2024.csv: a CSV file containing information on HDB resale prices
- 1_inspect_data.ipynb: a Jupyter notebook performing basic EDA on the data
- 2_train_and_deploy.ipynb: a Jupyter notebook which uploads the data to S3, starts a SageMaker training job, and deploys the trained model to a SageMaker Endpoint
- train.py: a Python file defining the inference and training code for the model

For further details on the roles in MAESTRO SageMaker, please refer to the MAESTRO SageMaker User Guide.

3.2. Overview of the Deployment Process

Deploying a machine learning model in SageMaker involves several key steps:

Prepare the Training Script: This script trains the model and defines necessary functions for SageMaker.
Upload Data to S3: SageMaker reads training data from Amazon S3.
Create and Train a SageMaker Estimator: This step involves specifying the training script and training the model.
Deploy the Model: Once trained, the model is deployed to an endpoint for real-time predictions.

Context

This will be a toy example focusing on a regression model that predicts HDB resale prices. Traditionally, a Data Scientist may perform EDA on the data, or even train and generate predictions locally. This corresponds to Stage 0 MLOps, and the 1_inspect_data.ipynb notebook.

Preparing the Model Script (train.py)

For Stage 1 MLOps, let's start with the training script. There are two parts of this script: (a) the model training code, and (b) the four essential functions used at inference time.

3.3. Model training code

The code below is executed when train.py gets called in the training step. It reads the training data, performs feature engineering, trains the model, and saves it to disk.

For this walkthough, we will use a simple Random Forest Regressor model. You can replace this with any other model you prefer, or perform more complex feature engineering and hyperparameter tuning.

joblib is used to save the trained model to disk. This is because SageMaker expects the model to be saved to disk, and joblib is a popular library for serializing Python objects. pickle is another option, but joblib is preferred for large NumPy arrays.

if __name__ == '__main__':
    # This block ensures that the script is being run directly and not imported as a module
    parser = argparse.ArgumentParser()

    # Parse arguments passed to the script
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))

    args = parser.parse_args()

    # Load training data from S3
    df = pd.read_csv(os.path.join(args.train, 'resale_prices_Mar2023_to_Feb2024.csv'))

    # Feature engineering
    # insert feature engineering code

    # Split the data into training and validation sets
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train the model
    model = RandomForestRegressor()
    model.fit(X_train, y_train)

    # Save the trained model
    joblib.dump(model, os.path.join(args.model_dir, 'model.joblib'))

3.4. Inference code functions

For inference, four essential functions are required: model_fn, input_fn, predict_fn, and output_fn. If these are not defined in the model script, SageMaker's default implementation of these functions will be used instead.

For this example, we overwrite the function definition for three out of four of these essential functions.

We will now explain each of these functions in detail:

model_fn: This function loads the trained model. When SageMaker hosts the model, it needs to load the model from the saved state, and this function facilitates that. model_dir is where the model artifacts are stored, and joblib.load loads the model from the specified directory.

def model_fn(model_dir):
    model = joblib.load(os.path.join(model_dir, 'model.joblib'))
    return model

input_fn: This is the first function that SageMaker would call when it receives an API call. Since API payloads are typically JSON, this function converts JSON input to a pandas DataFrame that the model can work with. The output X is passed to predict_fn as input_data.

def input_fn(request_body, request_content_type):
    features = json.loads(request_body)
    X = pd.DataFrame(features, index=[0])
    feature_town = features["town"]
    X = X.drop(['town'], axis = 1)
    for town in TOWNS:
        if town == feature_town:
            X.loc[:, town] = 1
        else:
            X.loc[:, town] = 0
    return X

predict_fn: This function takes in input_data generates predictions using the trained model. The return value is passed to output_fn as prediction.

def predict_fn(input_data, model):
    return model.predict(input_data)

output_fn: Formats the prediction output. This is not included in this example, but useful for custom output formats. For example, you may want to return a JSON response with additional metadata. If you wish to override the default implementation, the function signature for output_fn is output_fn(prediction, accept).

3.5. Uploading the training data to S3

For our example, before you can train the model, you will need to upload the training data to an S3 bucket. This is a necessary step because SageMaker reads training data from S3. In practice, however, the Data Engineering Team from Agency or Vendor should already have a Data Storage solution, which may require some integration.

Here we provide a simple script to upload the training data to S3 inside 2_train_and_deploy.ipynb.

import sagemaker

sagemaker_session = sagemaker.Session()
bucket = 'your-s3-bucket-name'
prefix = 'demo-housing-price-prediction'
train_file = "data/resale_prices_Mar2023_to_Feb2024.csv"

train_uri = sagemaker_session.upload_data(path=train_file, bucket=bucket, key_prefix=prefix)

3.6. Preparing the Training and Deployment Notebook (`2_train_and_deploy.ipynb`)

In addition to the data upload code above, the notebook includes code that defines the SageMaker Estimator, trains the model, and deploys it to an endpoint.

Designing the SageMaker Estimator and starting the training job

We now define a SageMaker Estimator, specifying the train.py script as the entry_point and necessary parameters. SageMaker's SKLearn is used instead of the plain scikit-learn because it integrates seamlessly with AWS infrastructure, providing managed training and deployment services.

from sagemaker.sklearn.estimator import SKLearn

# Define the SKLearn estimator
sklearn_estimator = SKLearn(
    entry_point='train.py',  # Path to the training script
    role=sagemaker.get_execution_role(),  # AWS IAM role
    instance_type='ml.m5.large',  # Type of instance to use for training
    framework_version='0.23-1',  # Version of the scikit-learn framework
    py_version='py3',  # Python version to use
    sagemaker_session=sagemaker_session  # SageMaker session
)

We can then start the training job by calling the fit method on the estimator.

# Train the model
sklearn_estimator.fit({'train': train_uri})

This will start the training job, and the progress will be displayed in the notebook.

Deployment

Once the model has been trained, we deploy it to an endpoint for real-time predictions. An endpoint is a web service that hosts the trained model and makes it available for real-time inference.

predictor = sklearn_estimator.deploy(
    endpoint_name=f"demo-price-prediction-{YOUR_NAME}",
    instance_type='ml.m5.large'
)

For the instance_type, you can choose the instance type that best suits your use case. The ml.m5.large instance type is a good starting point for most use cases. Please refer to the MAESTRO SageMaker User Guide for more information on available instance types.

Testing the Endpoint

Once the endpoint is deployed, you can test it within the notebook by sending a POST request to the endpoint with the input data. The input data should be in JSON format.

import json

# Sample input data
input_data = {
    "town": "ANG MO KIO",
    "flat_type": "3 ROOM",
    "floor_area_sqm": 60,
    "lease_commence_date": 1980,
    "storey_range": "10 TO 12",
    "remaining_lease": 60,
    "resale_price": 300000
}

# Send a POST request to the endpoint
response = predictor.predict(json.dumps(input_data))

# Print the prediction
print(response)

You may also test the endpoint within SageMaker's graphical interface. Navigate to the Endpoints section in the SageMaker console, select the endpoint, and click on Test.

3.7. What's next?

Congratulations on your backend deployment!

In this walkthrough, we focused on a simple regression model. You can extend this example in several ways:

Hyperparameter tuning: Use SageMaker's hyperparameter tuning feature to find the best hyperparameters for your model.
Feature engineering: Perform more complex feature engineering to improve model performance.
Model selection: Experiment with different models to find the best one for your use case.
API Gateway: Integrate the SageMaker endpoint with API Gateway so that it can be called within the GEN network.

If you are interested in how this prediction service can translate to a frontend, consider using CStack. In one of our add-ons (see Chapter 9B), we provide a simple example of creating a frontend environment using Streamlit and CStack (Container Stack).

Otherwise, let's move on to the next few chapters, which will provide a step-by-step approach towards Stage 2 MLOps.