NEW YORK, Sept. 23, 2019 (GLOBE NEWSWIRE) -- This week at O’Reilly Strata Data conference in New York, the foremost experts in the world of data and business will focus on best practice data techniques and technologies.
Senior engineers from Lightbend will be presenting three sessions focused on the intersection of streaming data and machine learning patterns in application development:
Hands-on machine learning with Kafka-based streaming pipelines
When: 1:30pm–5:00pm EDT Tuesday, September 24, 2019
Where: 1E 15/16, Strata Data Summit
Who: Dean Wampler (VP, Fast Data Engineering at Lightbend) and Boris Lublinsky (Architect at Lightbend)
Description: One possibility of training and serving (scoring) models is to treat trained model as code, then run that code for scoring. This works fine if the model never changes for the lifetime of the scoring process but isn’t ideal with long-running data streams, where you’d like to retrain the model periodically (due to concept drift) and score with the new model. The better way is to treat the model as data and have this model data exchanged between the training and scoring systems, which allows updating models in the running context.
Boris Lublinsky and Dean Wampler explore different approaches to model training and serving that use this technique, where one or both functions are made an integrated part of the data-processing pipeline implementation (i.e., as an additional functional transformation of the data). The advantage of this approach is that model serving is implemented as part of the larger data-transformation pipeline. Such pipelines can be implemented using streaming engines—Spark Streaming, Flink, or Beam—or streaming libraries—Akka Streams or Kafka Streams. Boris and Dean will use Akka Streams, Flink, and Spark Structured Streaming in their demos.
Online machine learning in streaming applications
When: 11:20am–12:00pm EDT Thursday, September 26, 2019
Where: 1A 21/22, Strata Data Summit
Who: Stavros Kontopoulos (Principal Engineer at Lightbend) and Debasish Ghosh (Principal Engineer at Lightbend)
Description: Applications such as smart homes, smart monitoring of industrial environments, augmented reality in retail, and autoconnected cars are driving a new era in online ML, where ML algorithms have been moved to the edge instead of the cloud. These applications are constrained in terms of resources like power, CPU, memory, etc. and responsiveness. Data flows in the system and the application needs to interact with the surrounding environment in a given time window.
Stavros Kontopoulos and Debasish Ghosh explore the foundations of the algorithmic aspects (Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms) of these applications and dive into the details of how they can be implemented and deployed efficiently in production. They evaluate production concerns like performance (latency, memory footprint, etc.), techniques for updating models being served in a running pipeline and future trends like feature space representation and sampling, and tools to use for the actual implementation and deployment of these algorithms.
The concepts they detail can also be applied in a cloud setting, so many of the practical aspects they cover are universal and will benefit any practitioner of ML. The main focus is cutting-edge applications and technologies, and you don’t want to miss a glance in the future.
Executive Briefing: What it takes to use machine learning in fast data pipelines
When: 4:35pm–5:15pm Thursday, September 26, 2019
Where: E 10/11, Strata Data Summit
Who: Dean Wampler (VP, Fast Data Engineering at Lightbend)
Description: Dean Wampler helps you develop a conceptual understanding of the challenges faced by your teams as they develop and deploy machine learning (ML) and artificial intelligence (AI) services integrated with fast data (streaming) pipelines. While combining these technologies is challenging, the benefits include timely delivery of innovative services to your customers.
You’ll gain a brief overview of the business justification for integrating ML and AI and streaming as well as the ML and AI scenarios that are best delivered through streaming. Dean walks you through the main challenges when using these technologies together; ways to bridge the gap between data science and production teams, their tools, methods, and sometimes conflicting goals, for example, the exploration of ideas and optimal scoring results versus production reliability and efficiency; streaming ML and AI services must run reliably and handle variable loads for a long time, requiring you to leverage best practices from the microservices world; and updating models in the streaming application before they become stale without downtime and other practical problems.