From home intrusion detection, to self-driving cars, to keeping data center operations healthy, Machine Learning (ML) has become one of the hottest topics in software engineering today. While much of the focus has been on the actual creation of the algorithms used in ML, the less talked-about challenge is how to serve these models in production, often utilizing real-time streaming data.
The traditional approach to model serving is to treat the model as code, which means that ML implementation has to be continually adapted for model serving. As the amount of machine learning tools and techniques grows, the efficiency of such an approach is becoming more questionable. Additionally, machine learning and model serving are driven by very different quality of service requirements; while machine learning is typically batch, dealing with scalability and processing power, model serving is mostly concerned with performance and stability.
In this intermediate level presentation for Architects, Data Scientists, Developers with O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, we will define an alternative approach to model serving, based on treating the model itself as data. Using popular frameworks like Akka Streams and Apache Flink, Boris will review how to implement this approach, explaining how it can help you:
To learn more from Boris about Machine Learning in production, check out his recent O'Reilly ebook Serving Machine Learning Models - A Guide to Architecture, Stream Processing Engines, and Frameworks. This practical report demonstrates a more standardized approach to model serving and model scoring–one that enables data science teams to update models without restarting existing applications—and introduces an architecture for serving models in real time as part of input stream processing