Skip to content

Chapter 5C: Hyperparameter Tuning

Hyperparameter tuning refers to the systematic process of searching for an optimal set of model configuration parameters—called hyperparameters that yields the best model performance on a validation set. These hyperparameters range from learning rates and regularization strengths to the number of layers in a deep neural network. Unlike model weights that are learned directly from data, hyperparameters are set before or during training and control how the learning algorithm behaves.


5C.1. Why Is Hyperparameter Tuning Important?

  • Improve Model Performance: Properly tuned hyperparameters help you achieve higher predictive accuracy or lower error.
  • Efficiency: Well-chosen hyperparameters reduce training time and computational cost.
  • Robustness: By controlling complexity and regularization, tuning helps prevent overfitting or underfitting.

5C.2. Overview of Hyperparameter Tuning Techniques

Grid Search - Approach: You define a discrete “grid” of potential hyperparameter values and the tuning process exhaustively tests each combination.
- Pros: Easy to understand and implement. Guarantees all options are tried.
- Cons: Becomes exponentially more expensive as the number of parameters or range of values increases. Can waste effort on suboptimal regions.

Random Search - Approach: Hyperparameter values are chosen randomly from predefined distributions.
- Pros: Often more efficient than grid search for large parameter spaces and covers a wide range of possibilities quickly.
- Cons: No feedback loop to refine subsequent trials as it follows a purely random process.

Bayesian Optimisation - Approach: Builds a probabilistic model (e.g., Gaussian processes) of the objective function and uses observations from previous trials to determine the most promising hyperparameters for the next trial.
- Pros: Smarter than random or grid search, typically finding near-optimal hyperparameters in fewer attempts.
- Cons: More complex to configure, but beneficial for cost-effectiveness and performance gains.

Hyperband (and Early-Stopping Methods) - Approach: Launch multiple configurations with limited resources, then prune poorly performing trials while allocating more resources to promising ones.
- Pros: Highly efficient for large search spaces and expensive models.
- Cons: More complex to set up, typically requires specialized libraries.

Recommended Approach: Bayesian Optimisation

From a best-practices standpoint, Bayesian Optimisation stands out as it allows us to balance both cost and performance. It strategically explores hyperparameter space rather than brute-forcing or randomly sampling it, often converging to a near-optimal solution with significantly fewer trials than grid or random search.


5C.3. Incorporating Hyperparameter Tuning in a SageMaker Pipeline

Below is a step-by-step guide on incorporating a hyperparameter tuning step (using Bayesian optimisation) inside an AWS SageMaker pipeline.

  1. Choose where model artifacts will live

    model_path = f"s3://{default_bucket}/{base_job_prefix}/ResalePriceProjectTrain"
    model_prefix = f"{base_job_prefix}/ResalePriceProjectTrain"
    

    • Creates an S3 prefix in your default bucket so SageMaker can upload the trained model (model.tar.gz) and any additional files after every training job kicked‑off by the tuner.
  2. Provide the training & validation channels

    # Set Model Training input location
    training_inputs = {
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
    }
    

    • Grabs the output URIs produced by a previous ProcessingStep (step_process) and maps them to SageMaker’s train and validation channels.
    • SageMaker will pipe these CSVs into every training container launched by the tuner.
  3. Pick the algorithm container

    # Get Container Image for XGBoost
    # https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/ecr-ap-southeast-1.html#xgboost-ap-southeast-1.title
    image_uri = sagemaker.image_uris.retrieve(
        framework="xgboost",
        region=region,
        version="1.0-1",
        py_version="py3",
        instance_type=training_instance_type,
    )
    

    • Retrieves the managed XGBoost Docker image for the region and instance type.
    • Ensures portability—no hard‑coded ECR path is required.
  4. Instantiate the Estimator

    # The Estimator Class Takes in a Container image_uri and performs training of model in the container.
    # https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html
    xgb_train = Estimator(
        image_uri=image_uri,
        instance_type=training_instance_type,
        instance_count=TRAINING_INSTANCE_COUNT,
        output_path=model_path,
        base_job_name=f"{base_job_prefix}/train",
        sagemaker_session=pipeline_session,
        role=role,
    )
    

    • Wraps the container image and the compute configuration (instance type, count, output path, role) in an Estimator object.
    • The Estimator acts as a template that the tuner will clone for every trial job.
  5. Seed the Estimator with default hyperparameters

    xgb_train.set_hyperparameters(
        eval_metric="rmse",
        objective="reg:squarederror",
        num_round=50,
        max_depth=5,
        eta=0.2,
        gamma=4,
        min_child_weight=6,
        subsample=0.7,
        silent=0,
    )
    

    • Provides sensible starting values for parameters not being tuned (e.g., objective and eval_metric).
    • SageMaker will override only those hyperparameters declared in hyperparameter_ranges.
  6. Define the target metric

    # Define the objective metrics for tuning
    objective_metric_name = "validation:rmse"
    

    • Tell SageMaker which metric string (logged by the training container) it should monitor.
  7. Describe the search space

    # Define the hyperparameter ranges
    hyperparameter_ranges = {
        "alpha": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
        "lambda": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
    }
    

    • Uses ContinuousParameter with a log‑scale search (good for regularization terms that span orders of magnitude).
    • You can extend this dict with more hyperparameters (e.g., lambda, max_depth) to widen or refine the search.
  8. Create the Bayesian Tuner

    # Define the tuner parameters
    tuner = HyperparameterTuner(
        estimator=xgb_train,
        objective_metric_name=objective_metric_name,
        hyperparameter_ranges=hyperparameter_ranges,
        max_jobs=10,
        max_parallel_jobs=3,
        strategy="Bayesian",
        objective_type="Minimize",
    )
    

    • strategy="Bayesian" tells SageMaker to use sequential model‑based optimisation (Gaussian‑process–style) to choose the most promising hyperparameter set after each trial.
    • max_jobs limits total trials; max_parallel_jobs caps simultaneous executions to control cost.
    • Lower RMSE is better, so later we specify objective_type="Minimize".
  9. Build the fit request for the tuner and drop the tuner into a pipeline step

    hpo_args = tuner.fit(inputs=training_inputs)
    
    # Define the training step using the HyperparameterTuner
    step_tuning = TuningStep(
        name="HyperparameterTuning",
        step_args=hpo_args,
        cache_config=cache_config,
    )
    

    • Produces a step argument object (hpo_args) describing how SageMaker should run the tuning job—data channels, metric name, search space, etc.
    • No API call is executed yet; the actual run happens when the pipeline executes.
    • Wraps the tuning job in a TuningStep so it can be orchestrated along with other steps (processing, evaluation, registration).
    • cache_config (if enabled) skips rerunning the tuner if the same inputs and code hash were used previously, saving compute time.

5C.4. Summary

Hyperparameter tuning is vital for squeezing the best performance out of your machine learning models. Although approaches like grid search and random search can be sufficient for small or initial experiments, Bayesian Optimisation is often recommended for larger, more complex use cases. SageMaker’s built-in HyperparameterTuner class makes it straightforward to implement Bayesian search, especially when integrated into a SageMaker Pipeline.

In this chapter, we:

  1. Explored the concepts and benefits of hyperparameter tuning.
  2. Compared Grid Search, Random Search, Bayesian Optimisation, and early-stopping methods like Hyperband.
  3. Showed you how to modify your pipeline code to include a Bayesian optimisation hyperparameter tuning step in AWS SageMaker.

By incorporating these steps, you ensure a more efficient search for optimal hyperparameters and a more automated, reproducible workflow that suits production-level MLOps requirements.