Imagine that, as a data scientist, you have designed your perfect machine learning model. You have done all the hard preprocessing and have chosen the best fitting mathematical model. It outperforms previous performance on test set without overfitting. The next step now is push your code to production so that decision making can be automated. How do you proceed if you were to do it?
Let's use as an example a demand forecasting problem where you want to forecast demand for your product in big cities, in order to be able to provide something accordingly. Let's assume this model is written in Python.
Review your performance metric
Classic metrics in Machine Learning are often averaged (RMSE, MAE, ...). These are often not good at taking into account edge cases. But these edge cases are exactly what could make your customers lose faith in your app.
For the demand forecasting problem, if your model is overall 10% better at forecasting demand, but every one in a thousand times forecasting fails horribly bad, having a 50 Mean average percentage error. This error could occur massive under provision and have big consequences for customers trusting you.
Along with good global performance you want to be able to evaluate your model stability and sensibility to extreme edge cases: In these cases you would want to try to detect the edge cases and redirect them to some other model, or just don't deploy your developed model at all.
Architecture of a basic ML app.
As a main principle you would want your application to be modular. This means you want to clearly separate the model forecasting logic, the model learning logic, the preprocessing logic and the app related logic. This will allow you to change models easily and to parallelize in a very simple way.
In a simple app you want to have two main modules running on different machines, the forecasting and the training module. These modules won't run at the same time. You generally re-train your models every sequence of time (could be a day, a month or an hour depending on how much new data comes up) whereas the forecasting can be called as it is needed. This allows to scale the right hardware for each of your module. In NIPS 2017, google announced that they even designed two different chips for training and testing of neural networks.
However this separation is not relevant in the context of Reinforcement/Online Learning, where learning and forecasting can be done by the same app.
Flask is a very useful framework allowing to build a python API very quickly. It allows you to quickly build your forecasting app. It has a good community and support, and can also deliver some visualisation/front-end in your app. To schedule your training, you can use a variety of tools, from simple CRON Jobs, to message brokers or queue systems.
Put your code in containers
Docker containers will make it easy for your app to be deployed on any platform. Where installing your dependencies directly on a machine can be difficult, you just have to create the appropriate Dockerfile, build your image and push it anywhere.
When it comes to deploying docker images Kubernetes can be a good helper as it abstracts some of the hard things to do with docker while also providing a good framework for continuously deploying your images.
Build your monitoring.
Your app will probably never work as intended forever. There will be new edge cases and new patterns that your model will fail to capture. Besides, you want to make sure that the improvement you expected translates in production. You want to be able to visualise online as much data as possible that will give you clues for further improvements.
In order to do so, don't forget to log as much as possible. Then, you can plug Jupyter notebooks to your databases. Another approach is to build and deploy an online dashboard. I found this very easy to do with Dash apps, a framework on top of flask, which allows to quickly and simply build interactive online dashboards.
Test your code and your models.
Testing your code is really important in order to improve your app maintainability. It also serves a good introduction to your code if you want to invite other developers/data scientists to work on it.
However testing models isn't quite easy as testing your code. They will evolve, and so will their outputs. This does not mean they are untestable. During your regular training you could separate your model in train and test, and then test your brand new model. This will provide visibility on how your model is performing. Besides, it will enable you to not roll-out your most recently trained model if it has not a good enough performance, and take the last best one instead. You could also add some alerting functionnality. Finally if your model is explainable or a physical model, you could test that your learned parameters makes sense and update the model only if they do so.
All these subjects could be discussed in much more depth, and this blog post does not talk about subjects around scalability, optimisation or alerting that could also be critical for deploying any Machine Learning application.