Chapter 8A: People & Knowledge Management

You have learnt a lot about successful implementation of MLOps, which require the right mix of people, processes & tools. In this chapter, we build on that foundation by proposing effective team structures and essential documentation resources to support your MLOps journey.

8A.1. People

We recommend that agencies begin with at least one ML use case overseen by an intermediate to advanced data science practitioner. This key individual will be responsible for developing the implementation roadmap and serve as either a Tech Lead or Lead Data Scientist. As your MLOps practice matures, you can strategically expand to a team of 2-4 members, with specific roles determined by your organization's unique MLOps requirements and maturity goals.

Roles & Responsibilities

Below are some typical roles you can consider when expanding an MLOps team.

Role	Responsibilities
Tech Lead	Oversees the data science team and sets the strategy for model development. They ensure that projects are aligned with business goals while providing technical guidance and ensuring best practices are followed. The Tech Lead acts as a bridge between business objectives and technical execution.
Data Scientist	Data scientists handle the core task of data analysis and model building. They prepare and process raw data, perform exploratory data analysis (EDA), and contribute to feature engineering. Their key responsibility is to build, validate, and interpret machine learning models to generate insights that can inform business decisions.
ML Engineer	ML engineers focus on making machine learning models production-ready. This involves transforming models into scalable solutions, managing CI/CD pipelines, automating infrastructure, and ensuring integration with existing systems. They also monitor system performance to ensure the performance of the models in real-world settings.
Data Engineer	Data engineers are responsible for the data infrastructure. They build and manage feature stores and handle the ETL (Extract, Transform, Load) processes that are essential for model development and deployment. Their work ensures that clean, usable data is always available for the rest of the team.

While we recommend having a dedicated team of 3 or 4 members for MLOps, it's common to see data scientists taking on multiple roles and also serving as ML Engineers or Data Engineers, especially in smaller teams. In many cases, they also collaborate with other team members like business analysts or project managers to ensure tasks are completed. The roles we've outlined above are just a guideline, representing what we consider the minimum team required to successfully develop, deploy, and maintain a machine learning system. Feel free to adapt these roles or combine responsibilities based on your team's needs and resources.

Competencies

Below are some competencies you can consider when expanding an MLOps team.

Science

Statistics & Machine Learning: Expertise in evaluating and developing machine learning models, with a strong ability to select appropriate algorithms and frameworks. Solid knowledge of machine learning libraries such as PyTorch, TensorFlow, and Scikit-learn is essential.
MLOps Knowledge: Understand how to design models for continuous processes, including continuous training, monitoring, and evaluation. This also includes knowledge on how to set up triggers for retraining and model updates.

ML Engineering

Cloud and/or On-Prem DevOps: Expertise in using cloud platforms or on-prem servers to manage & deploy machine learning models. Familiarity with maintaining CI/CD pipelines.
Containerization: Knowledge of containerization tools such as Docker and Kubernetes.
Monitoring & Logging: Expertise in using tools like Grafana & CloudWatch to track system performance and monitor ML models in production.
Infrastructure as Code (IaC): Knowledge of tools like Terraform or CloudFormation.

Data Engineering

Extract, Transform, Load (ETL) Processes: Proficiency in developing ETL processes to prepare data for machine learning models.
Data Pipeline Development: Expertise in building and maintaining robust data pipelines.

8A.2. Ensuring proper governance & system integration through collaboration with Infra & Security Team

Agencies often have several systems and established processes for determining appropriate security and governance. Engaging with multiple stakeholders is crucial to refining your solution. These include the CIO, CISO, Solution Architect, data governance team, and any existing vendors that will interface with you. We recommend using this playbook as a starting point for discussion, then collaborating with the various teams to align on the way forward.

Infra & Security Team

Scalability: Collaborate with Infra Team to ensure the infrastructure is scalable to meet the growing demands of machine learning models and data processing.
Data & Model: Clearly articulate potential issues related to data sensitivity and model integrity.
Security: Strive for a balance between security measures and operational efficiency.

8A.3. ML Science Document & System Tech Spec

To support you in knowledge management, we have created a documentation template for your reference. You can access it using the link provided below.

Resources linked are not openly accessible at this time. For access to these scripts, please contact Victor Ong (Victor_ONG@tech.gov.sg).

At the moment, you may need to request access as we continue to refine and develop this concept. We welcome your feedback and suggestions on how we can enhance this document to better serve your needs. Your insights are invaluable in helping us create more effective MLOps guidance for government agencies.