Scaling Machine Learning into production with MLOps
ChatGPT makes us feel like anything related to Machine Learning was basically invented last year but in fact the term machine learning was first coined in 1959 (by computer scientist Arthur Samuel) and by 1981 researchers were already experimenting with different teaching strategies on neural networks.
I would not be confident enough to point my finger on which project was the very first to use ML in full (public) production but one of the most typical early examples is credit scoring and that has probably been in use since the 1990s. In this application, machine learning models were used to predict the likelihood that an individual will default on a loan, based on factors such as their credit history, income or employment status. A similar early example of machine learning in production systems is the use of fraud detection models in the financial industry. These models use machine learning algorithms to identify patterns in financial data that may indicate fraudulent activity.
Today, all the tech giants use some sort of machine learning in their public production systems. Amazon’s recommendation system, Google’s search engine or YouTube’s recommendation engine (just to name a few) all use machine learning to improve search results and recommend videos or products to its users. As ML gets more and more widespread and even the tiniest projects start to utilize it, it becomes more and more important for companies to learn how they can effectively deploy and scale machine learning models into production more quickly and reliably.
What Is DevOps?
To give a bit of background, let’s start with refreshing our knowledge about DevOps. DevOps is essentially a set of practices and tools used to streamline the software development lifecycle, from code development to deployment and operations. DevOps is based on the idea of collaboration between development and operations (IT) teams, with the goal of achieving faster and more reliable delivery of new software iterations.
This approach emphasizes automation, monitoring, and continuous improvement, with the goal of reducing errors, improving efficiency, and increasing agility. DevOps increases the speed and quality of the software delivery, by breaking down the traditional silos between development and operations teams, and encouraging collaboration and communication between the teams. DevOps involves a range of tools and practices, including version control, continuous integration and continuous delivery (CI/CD), infrastructure as code (IaC), and monitoring and logging. These tools and practices help to automate and streamline various aspects of the software development process, from building and testing code to deploying and monitoring applications in production.
How does MLOps connect to DevOps?
Their underlying mindset is quite similar but while the main goal of DevOps is to streamline the software development and deployment process, the main goal of MLOps is to streamline the machine learning lifecycle. There is overlap between the two but MLOps places the main emphasis on the unique challenges of machine learning, such as data management, model training, and monitoring.
DevOps and MLOps also use different tools and processes to achieve their respective goals. While DevOps tools such as Git, Jenkins, and Docker can be used for MLOps, there are also specific tools like MLFLow or Comet ML and processes (for not just data but model versioning as well) that are unique to MLOps.
Machine learning is a cross-functional discipline that requires expertise in multiple areas which also requires a different set of teams involved. DevOps typically involves collaboration between development and operations teams, while MLOps involves collaboration between data science, software engineering, and operations teams. The challenges in managing large data sets, model training, monitoring, ensuring data quality, and preventing bias in machine learning models requires tools and processes that are specifically designed for MLOps engineers.
Why is MLOps important?
The main importance of MLOps is the faster and more reliable scaling and deployment of machine learning models. Without MLOps, the machine learning lifecycle can be slow, error-prone, and difficult to manage. For example, without the right practices and tools in place, it can be challenging to reproduce experimental results, deploy models to production, or monitor their performance over time.
MLOps provides a framework for addressing these challenges by bringing together the necessary tools and processes to streamline the machine learning lifecycle. It helps teams collaborate more effectively, automate repetitive tasks, and track changes to the machine learning system over time. This leads to faster development cycles, better model performance, and more reliable deployment.
How does MLOps work?
Version control enables teams to track changes to the code, data, and models over time, which is essential for reproducibility and collaboration. Version control systems such as Git allow teams to manage changes to the machine learning system, collaborate on code, and share results with others.
Continuous integration and continuous delivery (CI/CD)
Continuous integration and continuous delivery (CI/CD) is a set of practices for automating the testing and deployment of software. In MLOps, CI/CD is used to automate the testing and deployment of machine learning models. This involves automating tasks such as testing, building, and deploying models to production.
Containerization is essentially packaging an application and its dependencies into a single container. Containers make it easy to deploy and run applications across different environments, which is also very useful for deploying machine learning models. Containerization tools such as Docker and Kubernetes make it easy to package and deploy machine learning models.
Monitoring and logging
Monitoring and logging are needed to guarantee that the machine learning models are performing correctly in production. MLOps involves monitoring the model’s performance over time and logging any errors or issues. This enables teams to identify and resolve problems as quickly as possible to reduce the risk of any possible downtime.
Machine Learning Pipelines
A data pipeline means a sequence of actions that the system applies to data between its source and destination. Such data pipelines or MLOps pipelines, are usually defined in graph form, in which each edge represents an execution order or dependency and each node is an action. Because ML models always demand data transformation in some form, they can be difficult to run and manage reliably so using proper data pipelines brings the benefits in run time visibility, code reuse, and scalability. ML is itself a form of data transformation, so by including steps specific to ML in the data pipeline, it becomes an ML pipeline which enables tracking versions in source control and automating deployment via a regular CI/CD pipeline.
Most machine learning models require two versions of the ML pipeline: the training pipeline and the serving pipeline. This is needed because although they perform data transformations with similar results but their implementation is significantly different. However, in both cases it is critical to ensure that they remain consistent and to do this, teams should attempt to reuse data and code whenever possible.
For example, typically, the training pipeline runs across batch files that include all features. In contrast, the serving pipeline often receives only part of the features and runs online, retrieving the remainder from a database.
How to Deploy ML Models Into Production
- Foundation: starts with defining the business use case for the data and establishing the success criteria for measuring model performance
- Data extraction: the responsible data scientists pick the data from a range of sources and integrate it for the ML task at hand
- Data analysis: the data analysis process allows the team to understand the characteristics and data schema the model will expect and also identify which feature engineering and data preparation the model will need
- Data preparation: the data scientists divide the data into sets for validation, training, and testing producing the data splits in their prepared format as the output.
- Model training: to train different ML models, the data scientist implements various algorithms and conducts hyperparameter tuning to achieve the best results
- Model evaluation: the quality of the trained model is checked on a test set then the resulting metrics are used for assessing model quality
- Model validation: as the validation step, the team compares the performance of the model to a specific baseline
- Model serving: once the model gets deployed it can be used for serving predictions (as an embedded model for a mobile device or edge device; or to serve online predictions as web services or microservices with a REST API)
- Model monitoring: consistent and constand monitoring is used to determine if and when a new iteration is needed to be deployed
There is a direct correlation between the maturity of the ML process and the level of automation of the deployment steps. This reflects how quickly you can train new models given new data or implementations.
The three levels of MLOps – according to Google
MLOps level 0 – manual level. You might have the most recent state-of-the-art ML models but the build and deployment is still completely manual.
MLOps level 1 – automated ML pipeline with continuous testing (CT) and continuous delivery (CD) of model prediction. You must have metadata management, pipeline triggers, and automated data and model validation steps to automate the retraining to qualify as level 1.
MLOps level 2 – this level reflects a robust, fully automated CI/CD pipeline system that can deliver reliable, rapid updates on the pipelines in production. This automated CI/CD pipeline system enables feature engineering, hyperparameters, and model architecture rapidly, and automatically create new pipeline components, and testing and deploying them to the target environment.
Considerations during Model Deployment
Data sources and experimentation frameworks
Data used in training should be contextually similar to production data, but recalculating all values to make sure that our calibration is right is not practical. Creating a framework for experimentation that also includes A/B testing, tracking for debugging and performance measures is usually a must.
Model complexity is one of the key factors affecting cost and storage. More complex models such as an ensemble decision tree or a neural network require more time to load into memory on cold start and more computing time than less complex models like logistic regression and linear regressions.
Model drift essentially refers to the change in the model’s usefulness and accuracy over time. Data can change quickly and it can also change quite significantly which affects the features for training the model.For example, for a typical retailer using our model a network outage, labor shortage, change in pricing, or supply chain failure may have a serious impact and predictions will not be accurate anymore unless we take into account all the new factors.
There are some frequently used model accuracy measures like Average Precision (AP) and Area Under the Receiver Operating Characteristic Curve (AUROC) that can tests the model’s accuracy by measuring the model’s performance against new data or an important business performance metric. If our model can’t satisfy the acceptance criteria, our system needs to retrain the model, and then deploy a new version. MLOps tools and ML lifecycle management tools track which configuration parameter set and model file are currently deployed in production and almost all of these tools include processes for measuring model performance on new data and retraining if necessary based on our preset performance criteria.
In my view MLOps is quickly becoming one of the most wanted IT positions right now. Tools like TensorFlow are already widely available, people are more and more aware of the wonderful possibilities of ML but without the right engineers and mindset to maintain our machine learning lifecycle modern companies won’t be able to fully utilize the potential of the technology.
If you are interested in talking about machine learning or more specifically MLOps, please shoot me an email as I love to discuss everything related to the topic.