Lightbend accelerates your journey to the real-time enterprise by simplifying the development, deployment, and operations of complex, multi-component streaming data pipelines. This brief provides the background for why we created Cloudflow and helps you understand how it works with microservices to complement your cloud-native systems.
Digital transformation is driving business and technology strategy
Digital transformation is having a profound impact on the way we do business as professionals and how we live our lives as consumers. From a technological standpoint, a number of mega-trends are impacting our world, such as:
- Mobile business
- Big data
- The move to the cloud
- The Internet of Things
- The rise of ML (machine learning) and AI (artificial intelligence)
If you want to successfully align your business and technology strategy in today’s world, you’ll need to keep these trends top-of-mind.
Your business depends on timely extraction of value from data
The above trends share something in common—they all generate, process, or power massive amounts of data. According to Mike Gualtieri, VP at leading analyst firm Forrester Research, “The next frontier of competitive advantage is the speed with which you can extract value from data.” With so much data—and so much riding on the ability to take advantage of that data—it is critical to be able to make business decisions in real-time or as close to real-time as possible.
Building the real-time enterprise requires new ways to process and make streams of data actionable
Older popular architectures were not designed around the concepts of cloud native and streaming data. Paradigms designed to support online transaction processing cannot address the demands of these more modern, intelligent applications. As Anne Thomas from Gartner puts it, “Traditional application architectures and platforms are obsolete.”
Digital transformation drives new business opportunities
As a key technology provider enabling this architectural change, Lightbend has been at the heart of many transformative business solutions:
- Hewlett Packard Enterprises (HPE) uses Lightbend products to provide smarter data center infrastructure solutions by adding near real-time insights to the InfoSight predictive analytics platform.
- Capital One uses Lightbend products for real-time auto loan decisioning. The new Auto Navigator system reduced response times from 55 hours to under 200 milliseconds.
- Credit Karma uses Lightbend products to provide hyper-personalized data analytics to its users. The Lightbend Platform allowed Credit Karma to scale its ML model processing, enabling the company to give information to users who were looking to improve their credit scores.
Lightbend is also enabling new capabilities and opportunities across many other industries:
- Telecommunications—Processing call-data records, electronic data records, and other data in real-time to enable predictive maintenance and traffic rerouting and provide better customer service and targeted offers
- Banking—Improving fraud detection, risk mitigation, and other regulatory analytics; processing positional, trade, and market data
- Energy—Analyzing sensor data to optimize costs
- eCommerce—Improving customer retention from better recommendation engines and next-nest offers
Digital transformation presents new opportunities for every aspect of your business, enabling possibilities that were, until recently, too expensive to implement.
These new opportunities need new tools
So how can you efficiently and effectively capitalize on the opportunities of digital transformation? Working with many Global 2000 enterprises, we’ve found that the path to digital transformation requires investments in building modern, data-centric applications that can take advantage of the mega-trends we described earlier. But as with all technical transformations, new tools are required to support development and operations of these new applications.
This new class of applications requires a very different architecture from what businesses have been running on for the last 15 years. Traditional Java EE, .NET, application server-based, monolithic, 3-Tier, and Hadoop / batch-style applications are just not going to cut it. The solution is a true cloud-native architecture. But what exactly does that look like?
Your data strategy must match your cloud strategy
The cloud has enabled many new types of scalable and always-available cloud-native applications. Not only does the cloud provide you with scalable compute, but it also gives your engineers easy access to specialized compute like GPUs for efficient ML model scoring. Emerging de facto standards like Kubernetes allow your data engineers to leverage the cloud in a portable yet optimized manner.
Data engineers face new challenges
Data engineers, in attempting to realize digital transformation, are being asked to perform tasks they previously were not required to do. As data analytics moves closer to the application, the line between data engineers and application developers blurs. Today’s data engineers need tools that help them act like application developers—without adding untenable complexities to their jobs.
Batch processing is not suitable for new challenges
First generation big data focused on the problem of bringing data together into a data lake, where it could be analyzed offline—in batch-mode—by data scientists. Batch processing schedulers like YARN to make it easier to run Spark batch jobs and specialized tools like Apache Hive for a batch query. These tools do their jobs well, but are specialized and not suitable for the mix of workloads demanded by digital transformation and today’s blend of streaming, batch, and microservice workloads.
New standards are emerging for data engineers and data scientists
Mixed workloads demand new cloud-native tooling to run different parts of your architecture, including:
- Stateless and stateful microservices
- Batch jobs
- Longer-running streaming workloads
In customer-facing applications, this tooling must support everything you would expect from an application. These applications need to be:
- Scalable, so they can handle demand as it grows or spikes—without redesign or deployment to new hardware
- Resilient, so that if some aspect of your solution fails, the application continues to run
- Responsive, so that applications always respond to users, even in the face of component failure or unreachability
We refer to these new requirements on data engineering solutions as convergence—that is, the convergence of data engineer requirements with application developer solutions.
Kubernetes has emerged as the de facto cluster manager on which to build such solutions. Kubernetes provides support for each of the above workload types, as well as:
- Efficient resource scheduling to meet specific requirements of various workload types
- A rich ecosystem dedicated to simplifying digital transformation
- Cloud-vendor-provided hosted Kubernetes services to optimize your experience on all popular cloud environments
Choice is vital for application frameworks
Whereas Kubernetes is emerging as the clear winner for cluster management, there is no such distinct solution for application-level frameworks. To fully achieve the benefits of digital transformation, you need to embrace a multi-solution approach. Among stream-processing engines, Apache Spark, Apache Flink, Akka Streams, and Kafka Streams all have their places.
Convergence is about alignment and integration
Convergence is not just about the needs of data engineers aligning with those of application developers. Convergence also means that data pipelines and application microservices now need to be able to integrate. Tools that make this easier, that expose endpoints of streaming data pipelines as scalable services, accelerate the availability of new solutions.
Accelerate your digital transformation with tools that ease the process
Digital transformation is far from simple. Fortunately, tools like Kubernetes have emerged to ease the process, and ecosystem tools like Kubeflow—running atop Kubernetes—simplifies management of ML application lifecycles. But additional tools are needed to help integrate and operate the various application frameworks.
The Integration Headache, or “Wasn’t this about business opportunity?”
Let’s say you’re convinced about Kubernetes, have settled on a cloud strategy, and are ready to begin on the path to transformation. Your next job will be to figure out how to integrate a variety of hosted services, application frameworks, and legacy data sources. You’ll soon discover that 90 percent of your team’s time is devoted to managing this integration, rather than focusing on the business opportunity promised by digital transformation.
Lightbend powered you through the first wave of cloud-native applications
Lightbend was founded before the first wave of digital transformation had even begun to crest. We were at the forefront of cloud-native applications and powered the big data revolution.
Enterprises have adopted our Reactive principles for cloud-native, microservices-based applications
In 2014, Lightbend co-founder and CTO Jonas Bonér defined and codified Reactive with the Reactive Manifesto, which now has 24,000 signatories from around the world. The manifesto has seen broad industry adoption and effectively lays out the principles of true cloud-native application design.
Reactive systems are:
- Responsive—Super high-performance, providing instant feedback based on user and API interactions in all circumstances
- Resilient—Recovering and repairing themselves automatically for seamless business continuity
- Elastic—Able to predictably and elastically scale up and down on-demand across cores, nodes, clusters, and pods
Responsive, resilient, elastic. Sound familiar? It should, because these are the qualities associated with your cloud-style infrastructure—only now they apply to your applications as well.
And the fundamental, underlying architectural tenet of being able to support elasticity while maintaining high resilience is that the systems should be:
- Message-Driven—Processing messages in parallel, asynchronously, without blocking, to ensure loose coupling, isolation, and location transparency. (In other words, services can run anywhere and can communicate with other services without having to know where they physically live—an essential aspect of cloud-native design.)
Modern infrastructure is built on our Scala technology
Lightbend is also the driving force behind the Scala programming language. Scala is the language on which big data solutions are built. Kafka, Spark, and Akka are all written in Scala.
With our essential solutions for microservices and big data, Lightbend has been leading digital transformation from the beginning.
And now, introducing Cloudflow: a groundbreaking new framework to accelerate your path to the real-time enterprise
Thus far, this paper has focused on the business and technology landscape, exploring why you need a new set of tools to help you through digital transformation. The end goal of this digital transformation, from a technical standpoint, is to enable the real-time enterprise. That is, to allow your business to improve operations and tap new revenue streams by turning data into actionable intelligence before the data loses its value.
Cloudflow builds on the new set of tools mentioned in this paper—Kubernetes, Kafka, Akka, and Spark—to enable you to quickly act on new business opportunities presented by streams of data.
Cloudflow dramatically simplifies development, deployment, and management of multi-purpose streaming data pipelines in Kubernetes
Today, the development of a reliable streaming data pipeline is constrained by complexity at one end, and fragility at the other. Simple or easy to use solutions require significant operational investment, and complex solutions that need to be inherently stable often burden the engineer with many of the reliability concerns that rightly belong as part of the infrastructure. The problem is meeting both of these constraints with a solution that doesn’t sacrifice simplicity for reliability.
Cloudflow delivers on this promise by providing a simple to use API built to take advantage of Kubernetes’ natural resilience, offering automatic checkpointing, a clear contract for interoperability, and marrying it with industrial strength data processing frameworks. It supports an extensible number of stream runners like Apache Spark Structured Streaming, Apache Flink, and Akka streams.
Cloudflow refines the developer experience
Streaming data pipelines typically comprise several stages, sometimes written by different teams. For example, one team may be responsible for ETL, another for developing ML models, and another for performing real-time market analysis.
Cloudflow allows each team to choose the right streaming tool for the job. The developer does not have to write any boilerplate code—code that manages messaging between components, serialization, and frameworks. This is all managed for you by Cloudflow.
Cloudflow also creates a contract for pipeline stage, or “streamlet” interoperability. The contract leverages the strength of extensible schema types to offer simple compatibility for stages in a pipeline. In turn, this schema ensures that each streamlet is interoperable, and provides a consistent place to perform checkpointing.
“What used to take weeks to configure and deploy
can now be done in a single day!”
Cloudflow allows you to easily integrate your microservices applications with streaming data pipelines
In this new world, data engineers must think about how their data pipelines interact with microservice-based applications built by application developers. Cloudflow allows you to automatically expose http endpoints, making application integration simple.
Cloudflow simplifies integration with backend systems
Cloudflow leverages Akka Alpakka technology to turn your legacy data system into a stream. This not only allows you to integrate with your current data sources, but also provides an easy migration for when legacy systems are modernized. Alpakka makes it easy to ingest data from an old database or other data source and write results back to these same systems.
Cloudflow provides out-of-the-box operational insight
Cloudflow comes with Lightbend Console, which features a Cloudflow-specific UI for visualizing data moving through your data pipelines. Spot performance bottlenecks, lagging processes, and other problems instantaneously. Lightbend Console allows you to set alerts and customize the way you monitor processes to match your unique business needs.
Power your real-time enterprise with Lightbend
With streaming data pipelines requiring an “always on” architecture, building reliable, scalable streaming systems can be difficult as you are guaranteed to face all nature of data and system faults. Planning for failure, recovering completely and gracefully, and doing so as a part of the underlying framework, instead of in your business logic, is critical.
Lightbend technologies have been at the forefront of cloud-native applications, ML, and streaming technologies for many years. More recently, an ecosystem has matured supporting easier operation of these systems. Cloudflow builds upon this ecosystem to simplify development and integration of streaming applications, reducing the barrier to embracing event-driven, real-time applications that power your business transformation.