Companies are moving from systems designed for data "at rest" in data warehouses and embracing the value of data "in motion" at the edge–whether from user data, sensors or humans. Extracting information from incoming data as quickly as possible has become a competitive advantage. Stream processing is the best option and it has sparked new tools, frameworks, and architecture patterns for developers creating applications and systems designed specifically for real-time data.
I sat down with Dr. Dean Wampler, Office of the CTO and Big Data Architect at Lightbend, we discuss how streaming data is shaking things up, and how the upcoming Lightbend Fast Data Platform is bringing Fast Data to enterprises looking to modernize today.
Dr. Dean Wampler: At the developer level, what we’ve seen is an explosion of frameworks specifically targeting Fast Data and streaming applications. Spark, Kafka, Akka, Akka Streams, Gearpump, Flink - the list goes on and on. So there’s this huge diversity of framework choices, which is an obvious positive.
But there’s also a fair amount of confusion - which framework to use for which use cases, which ones have overlapping functionality with the others, and so on. We’re also seeing some fundamental ways that data streams are forcing rethinking of application and system design, because now your applications are on 24x7, potentially for months or years...
DW: Always-on streaming systems raise the bar for the operations team. When you’re running batch jobs offline, you have fewer concerns about high availability, resilience, etc. If a two-hour batch job fails, you fix the failure and rerun it. (I’m oversimplifying a bit.) However, when you start a streaming job, you expect it to run for days, weeks, even years. Any low probability issue will arise if you wait long enough, whether it’s an unusual hardware failure, power glitch in the datacenter, “100-year” traffic spike, etc.
Hence streaming systems must embody the Reactive principles of excellent responsiveness, scalability, resiliency, etc. That’s changing infrastructure for data centric systems, in addition to the long-standing concerns about having sufficient capacity, security, etc. for using very large datasets.
DW: With our upcoming Fast Data Platform (FDP), we’re creating what we think is the first complete platform for data teams that are moving from the classic Big Data architectures to the newer Fast Data architectures.
We’re simplifying cluster installation and management for these frameworks built to run in a distributed fashion, and we’re making it easier to deploy, monitor, and scale applications in production so that development is more productive and production operations are more reliable.
DW: We’ve identified that a lot of organizations want to adopt streaming technology, but they need help. FDP aims to help team avoid rookie mistakes and provide rapid answers to questions like:
DW: Lightbend Fast Data Platform bundles Apache Kafka, Apache Spark, Mesosphere DC/OS, OpsClarity, Apache Flink, and Lightbend Reactive Platform, including Akka, Akka Streams, and the Play and Lagom Frameworks. We also Include installation, integration, and monitoring tools tuned for various deployment scenarios, plus sample applications to help you sort out which tools to use for which purposes.
Apache KafkaTM - Confluent, founded by the creators of Apache Kafka, has developed Confluent Platform so enterprises can reliably run their business in real time at scale. Lightbend FDP enhances the integration of Kafka with the other streaming technologies in FDP, such as Akka Streams.
Apache SparkTM - Databricks, the company behind Apache Spark, has aggressively evolved Spark and nurtured the Spark ecosystem. Lightbend engineers have worked with Databricks on enhancements to Spark Streaming and the support for Apache Mesos in FDP.
Apache FlinkTM - data Artisans, the company behind Apache Flink, has contributed to the integration of Flink with FDP for sophisticated streaming scenarios, such as low-latency, stateful and high-performance applications.
Mesosphere DC/OS - Mesosphere, the company behind Apache Mesos. has built a comprehensive platform called DC/OS that includes databases, HDFS, security, and other tools required by real environments. They have worked with Lightbend to make DC/OS the best possible platform for running applications developed with Spark and the Lightbend Reactive Platform.
OpsClarity - OpsClarity has built an intelligent monitoring solution tailored for streaming applications to deliver integrated monitoring and diagnostics for essential end-to-end visibility into the application and data pipeline. Lightbend has worked with OpsClarity to implement monitoring for Lightbend Reactive Platform-based microservices.
Lightbend Reactive Platform - We are the company behind Reactive Platform, providing a tried and true set of frameworks for the rest of your microservices development and operations.
DW: The Early Access Program is for teams that just can’t wait any longer to get started. Participants in this program will also provide us valuable feedback as we fine tune the capabilities provided by FDP. Our Fast Data Quick Start and Launch services provide the additional expertise and guidance from our Fast Data team to support these projects. We have a landing page for both of these programs, so check them out to get moving today.