Strategies for Productionizing our Machine Learning Models

So you have written your best machine learning code, and have now tuned the model for the best accuracy.

Now what?  

How would you deploy your model so that the business can actually take advantage of the model and make better decision ?

This is a follow-up post from my last post where I discussed Productionizing Big Data ETL. In this Post, I will discuss what are the different strategies to productionize our  Machine Learning Models.

Training Workflow

As we all are familiar with machine learning, and how to find the best models. This is how our workflow looks like-


Typically as a Data Scientist, we split our data as train and test data and train model with train data  and then we predict our test data and  fine tuned our models based on error metrics or accuracy.

At the end of this activity, we achieve our best model with the best accuracy score.

Deployment Workflow

Typically once we have found the best model we would want to save it somewhere. Saving model is important because we have achieved this model after a lot of hard work and time in the training phase. We would like to reuse the model and load the model for the prediction phase for our new data.

This is how a typical deployment workflow would look like –


Here in this workflow we need to have some mechanism to load our saved model and to run it through the new arriving data. The output of this phase is the predicted data/labels for the input data.

The business might want to save this data somewhere, or take some action based on the prediction data in real time.


Deployment Strategies

There are various strategies to deploy our machine learning models. The choice of the strategy depends totally on the business requirement how we plan to consume the output prediction.

Batch Predictions

This prediction strategy typically runs predictions at a particular time of a day, or multiple times a day at fixed interval. This strategy is very useful where we are not waiting for real time decisions and can batch our input data by time.

The primary steps are:-

  • Code runs at fixed time/interval
  • Code loads the model from the saved location
  • Code reads a batch of input data
  • Input data is new and unlabelled data that we want predictions for
  • Input data might have data for multiple users/entities grouped together
  • Code runs model prediction over the Input batch and produces a Prediction batch 
  • Prediction batch contains the predicted labels for each record in the input data
  • Predicted data is then saved in some new location

As the batch prediction pipeline keeps running, new predicted data keeps appending to that location. This data can be used for analysis and decision making.

Streaming/Realtime Predictions

This strategy is extremely useful when decision has to be made immediately. Usually there is a application that needs to make some decision on the fly based on user interaction/behaviour/attributes.

The primary steps are:-

  • We have a web service that wraps our code
  • The web service exposes Rest endpoints for getting predictions
  • Consumer application makes web service call and sends input data in  Json format
  • Input data contains all the feature required for prediction. It typically has only one record instead of a batch
  • Code loads the model from the saved location
  • Code gets input data when the web service endpoint is called
  • Code runs model prediction over the Input data and produces Prediction data 
  • Prediction data is return back to the consumer application
  • Consumer application can decide how to use the prediction data for a better user experience

Ad-hoc Predictions via SQL

This is a new trend that has caught off in the industry recently. This approach exposes the ML model as a SQL function. This approach treats new input data as tables and allows ad-hoc analysis on the data  by running our ML model as a function. The output is also viewed as table and can be saved for future if required.

The primary steps are:-

  • ML model is wrapped in a SQL UDF
  • There is an SQL execution engine (like Spark or Google Big Table) that understands the UDF
  • SQL execution engine loads the code in a UDF
  • The user issues a SQL query to the execution engine, selecting the feature table in the SQL query
  • The execution engine runs the input features through the UDF to compute prediction
  • The prediction data is returned to the user
  • User might save the predicted data as a new table

Implementation Ideas

There are various ways this problem is being solved in the industry.

Airflow Scheduler

Airflow scheduler is a great fit for the batch prediction strategy. It can run both ML training and ML predictions as separate jobs running at different intervals. Airflow also provides visibility over all of the past runs. All the prediction runs can log the prediction details for future analysis.

Docker Containers

This is one of the approaches that Kaggle takes for its competitions.  We can spin up Docker containers for training and prediction jobs.  The containers can be used both for batch and real time prediction strategies.


Amazon SageMaker

This is one of the most popular deployment strategies. The best part is its managed by Amazon and we don’t need infrastructure for model deployments.

We can provide our ML code in Jupyter Notebooks. Each ML model deployed on sagemaker gets an unique deployment endpoint that can be used for predictions.
Here is a great tutorial on getting started with SageMaker



Google BigQuery ML

Google has started offering machine learning capabilities on top of BigQuery. It lets us use ML models as SQL functions on top of existing BigQuery tables.

Here is a great tutorial on getting started with BigQuery ML 


ML Flow

ML flow is a new initiative from Databricks. ML flow is open source and provides us capabilities for model tracking and serving. ML flow is also  great for training lifecycle because it provides a web interface that shows model performance and accuracy over multiple runs. ML flow also lets us to group multiple runs as part of a single experiment.




The great thing about ML flow is that we don’t have to code for saving and loading models. ML flow provides us Python library to load and save models.

Another great thing about ML flow is that, once a Model is saved with ML flow it can be deployed by any of the three strategies mentioned above.

Here is a great tutorial on getting started with ML flow


That’s all for this post. In the next post I am planning to deep dive into ML flow or SageMaker. Stay tuned.


Checkout my portfolio here:

I am a greenhorn Data Science student with interest in finding patterns in data. My language of choice is Python and I am starting to get my hands dirty with R.

I blog on [1] and [2]. I share my code on [3].


Leave a Reply

Your email address will not be published. Required fields are marked *