Home Automated MLOps Pipelines: Best Practices for Getting Started
Post
Cancel

Automated MLOps Pipelines: Best Practices for Getting Started

Organizations utilize artificial intelligence (AI) and machine learning (ML) to increase visibility into customer behavior, operational efficiencies, and many other business pain points. These trends are forcing companies to make large investments in data science technology and teams to build and train analytical models. Businesses are benefiting from these investments, but the greater value can be realized when AI/ML practices are operationalized.

Most enterprises today are doing ML tasks that are manual and script driven. The diagram below is a typical example of a manual model:

Machine Learning Simple Pipeline

  1. Data Scientists consume historical data and do manual processes including data analysis, feature engineering, model training, testing, and validation.

  2. Once the model is trained, it is handed over to engineers as an artifact.

  3. This artifact is then deployed on an infrastructure.

However, this process is laden with a number of challenges and constraints:

  • Data is manually delivered into and out of the models.

  • Anytime models change, IT assistance is required.

  • Data scientists consume offline data (data at rest) for modeling. There is no automated data pipeline to quickly deliver data in real-time and in the right format.

  • Because data modeling lacks a CI/CD pipeline, all tests and validations are handled manually as a part of scripts or notebooks. This leads to coding errors and incorrect models, causing degradation in predictive services. This creates problems in compliance, false positives, and loss in revenue.

  • Manual processes lack active performance monitoring. For example, what if the model isn’t working correctly in production and you’d like to understand how and why?

  • In addition, manual model processing lacks the ability to make comparisons using particular metrics of your current production model (e.g., A/B testing).

Building an Automated MLOps Pipeline

Traditionally, developing, deploying, and continuously improving a ML application has been complex, forcing many organizations to gloss over and miss the benefits of automating data pipelines.

To tackle the above challenges, an ML operations pipeline (MLOps) is needed. The below example outlines what a typical MLOps pipeline looks like:

Machine Learning Pipeline

  1. Data Scientists consume offline data for data analysis, modeling, and validation.

  2. Once the model is pushed through the pipeline deployment, it is passed onto the automated pipeline. If there is any new data, it will be extracted and transformed for the model training and validation.

  3. Once a model is trained or is better than the one already serving in production, then the trained model is pushed to a repository.

  4. The trained model is then picked up and deployed in production.

  5. The prediction service emits labels. This service could have multiple consumers.

  6. A performance monitoring service consumes this data and checks for any performance degradation.

  7. If there’s a new stream of data coming in, then it will send data through an ETL pipeline that will eventually be used by data scientists. It will also call the trigger service.

  8. The trigger service can be scheduled depending on the use case. We’ve seen the following use cases in enterprises:

    • When there is a performance degradation in the model.

    • When there is new incoming data.

Check out google mlops for more information.

This post is licensed under CC BY 4.0 by the author.