New Name - Cloudflow is now Akka Data Pipelines
In this short introductory video by Gerard Maas (@maasg), Principal Engineer at Lightbend, we explain how Cloudflow works with Kubernetes from a developer perspective. Follow along with the video transcript below, and if you're an organization looking to bring Cloudflow into your business, schedule a demo with us!
Hi, I'm Gerard Maas. I'm with the cloud engineering team at Lightbend. In this video, we're going to talk about Cloudflow, and how it reduces the complexity of creating and deploying streaming applications on Kubernetes.
But before we start, let's talk about Kubernetes adoption for a minute. Probably, every Kubernetes adoption starts at deploying a single pod from a Docker container. We apply a YAML description to the cluster, and Kubernetes takes care of instantiating the process for us. As our maturity level increases, we upgrade to more advanced Kubernetes services. For example, a deployment offers us capabilities such as self-healing and scaling up or down. And as we discover the advantages of our cloud native technologies, we move more and more workloads to it. We use Kubernetes' extensibility to add other platform services, such as an event broker like Apache Kafka.
Kafka's publish-subscribe model allows us to decouple the interaction between services, and makes the architecture more scalable; but now that applications communicate over a common channel, it is easy to lose track of who's talking to whom. And as we move our real-time analytics and machine learning pipelines to Kubernetes, we are faced with increased complexity.
Distributed frameworks like Apache Spark and Apache Flink require special considerations in terms of security, storage, and a state recovery. As we see here, our deployment process becomes a mix and match of commands and resources required by these technologies. This architecture introduced two major challenges.
The first one is, how do we preserve consistency? How do we manage a group of resources as a single logical application? And how can we ensure that different services can exchange information with each other in a compatible way as they evolve?
The second challenge is complexity. How do we ensure that the development team stays productive and focuses on creating value for the business instead of chasing after some obscure Kubernetes detail? Cloudflow offers answers to these questions. Let us explore how.
Cloudflow is a development toolkit that enables you to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes. Cloudflow consists of two parts, a developer toolkit that we use to create applications, and a set of Kubernetes extensions that facilitates the deployment and management of these applications.
On the development side, Cloudflow comes with an API and tools to facilitate application creation. We use the Streamlet API to develop the different components of our application, and we write a blueprint to define how those streamlets are connected together. The sandbox is a local runtime that we offer to quickly test our applications end-to-end. You get a blazing fast feedback loop for the functionality you are developing, removing the need to go through the full package deploy and launch process on a remote cluster. This makes the development test feedback cycle very fast. When we are confident with our development, the Build Tool extensions help us package the application as Docker containers and publish it to the registry of our choice. Finally, the Cloudflow plugin for kubectl let us deploy and manage our application on our target Kubernetes cluster.
On Kubernetes, Cloudflow uses the operator-based extensibility model to add the capabilities necessary to run and manage the streaming applications as native Kubernetes applications. For that, we rely on a number of existing open-source operators for Apache Spark, Apache Kafka, Apache Flink, and Akka Streams. The Cloudflow operator acts as an orchestrator to ensure that applications run end-to-end. Let's see an example of application development with Cloudflow.
This is a system we found in the wild at one of our customers. It ingests data records on cell towers and applies some cleaning and enrichment logic before creating different aggregations that are produced to upstream systems. To use Cloudflow in this project, we start by defining the schema of the data ingested.
Each independent transformation step can be represented as a streamlet while the flow between the streamlets becomes their connections. The Streamlet API defines a component model characterized by their inputs or inlets, outputs or outlets, and the logic they implement. The inlets and outlets of a streamlet are bound to a schema, so we know what kind of data they consume and produce and we use that information to validate that all connections in an application are consistent. The logic of each streamlet is written in the native API of the backend of our choice. Cloudflow currently supports Akka Streams, Spark Structured Streaming, and Apache Flink. This model is naturally extensible to more runtimes and we are currently working on adding polyglot support.
Once we have created the code for each one of our streamlets, we use a blueprint to define how they connect. A blueprint is a text file that describes the streamlets included in our application and their connections. Together, the streamlet code and the blueprint form a Cloudflow application that can be deployed and managed as a single logical unit.
Cloudflow comes with several extensions to the Build Tool. The Verify plugin checks that all connections in our blueprint are valid–that is, that the connecting schemas are compatible and that all connections are satisfied. RunLocal lets us test the complete application in our local development machine. We'll review these options in detail in our next video. The kubectl Cloudflow deploy command initiates the deployment of a complete application.
Through a process that we call operator federation, the Cloudflow operator takes care of orchestrating the deployment of the different components delegating to the correspondent operator of each subsystem. Cloudflow ensures that the application model remains consistent and independent of the specific requirements of each component. Note how this model is extensible. So watch out this space for more supported runtimes in the near future.
Once our application is running, we want to be sure that it is behaving properly. The Lightbend commercial offering augments Cloudflow with observability capabilities that ensures a robust and headache-free production deployment. In a nutshell, with Cloudflow, you can develop a streamlet using Apache Spark, Apache Flink or Akka Streams, create a blueprint that describes the flow of the application, test locally on the sandbox for quick feedback, package and publish a docker image using the Build Tool extensions, deploy to Kubernetes and manage with the kubectl plugin, and ensure operational observability with the console UI.
The Cloudflow API abstractions combined with the blueprint ensures the consistency of the application end-to-end. All components of the toolkit work together to term the complexity of distributed applications on Kubernetes and lets you be productive from creating value for your business. So, get started at cloudflow.io and stay tuned for more Cloudflow videos!