Stateful Cloud Native Applications - Why Reactive Matters
What is a Cloud Native Application?
A Cloud Native application is an application designed to leverage the cloud operating model. They are predictable, decoupled from the infrastructure, right-sized for capacity, enable a tight collaboration between development and operations, decomposed into loosely coupled independently operating microservices, resilient from failures, driven by data, and intelligently operate across geographic nodes.
It’s not uncommon to see an application demand increasingly tougher requirements:
- High transactions per second (TPS), i.e. >1000 with 100x spikes
- Terabytes of real-time and at-rest data from dozens of sources that have to be eventually or strongly consistent across the system
- Multi-device with varied user experiences
- Micro latency and responsiveness
- Offline/online behaviors with eventually consistent reconciliation
- Highly available, fully recoverable
- Polyglot, services use a variety of languages, runtimes, and frameworks
- Contain a clean separation of stateless and stateful services
Serverless Computing, in particular so-called Function-as-a-Service (FaaS) as provided by AWS Lambda or Cloudflare Workers, is a form of Cloud Native application development that—however for a limited set of use-cases—gives us a glimpse of what the future of Cloud Native will look like.
Serverless brings new development and operations experience for the cloud, with a cost-efficient, pay-as-you-go model, where your services are fully managed for you. Systems built in the serverless model provide a simple programming model that constrains the developer’s options, power, and flexibility, in exchange for lower development and operations costs, fewer decisions, simpler scalability, and faster time- to- market and turn around.
Stateless vs Stateful Cloud Native Applications
There are two major classes of Cloud Native applications: stateless and stateful, with each class addressing a different set of use-cases.
Stateless can be defined as: “When an application is stateless, the server does not “remember” any state about the client session, instead, the session data is “remembered” on the client and passed to the server as needed.”
This type of stateless architecture is great for applications where your clients should be allowed to go offline when connectivity is poor, continue to function, and merge its local session state back in the cloud when a connection becomes available. However, most applications need to “remember” session and domain state on behalf of its users, sometimes long term which implies some sort of durable storage.
The word “stateless” is often used to describe and market, for example, isolated components, as in “stateless service/function”, or isolated layers, as in “stateless web-tier”. If we examine each of these parts individually in isolation they can be seen as stateless (according to our definition above). But each one of these stateless parts is a cog in bigger machinery: the system—and what matters is what user experience and guarantees the system as a whole can give its end user.
The important question to ask is: does the system as a whole “forget” or “remember”? If it’s the latter—which is where the majority of web, cloud, and enterprise applications reside—then it needs to manage state on behalf of its user, most often store it for long term usage in a database, and is, therefore, to be considered stateful.
The problem is while most Cloud Native applications need to be stateful, state is often poorly and inefficiently managed using tools, patterns, habits, and “best practices” originating from the classic 3-tier architecture centered around the single mighty database—where the runtime is not aware of the data access patterns and can therefore not accommodate them, optimize them, or reason about their implications on the availability of the system.
Running on the cloud has consequences, it means entering the world of distributed systems. A world where non-determinism rules, and where managing state while maintaining its safety, correctness, and consistency, but without sacrificing the scale and resilience that the cloud can offer, is very hard. To maximize our chances of success in building stateful Cloud Native applications that can take full advantage of the cloud we need new tools, patterns, and practices, and it’s here that Reactive Architecture leads the way.
Embrace Your Data
Nowadays businesses are measured by their data—the quality of their data, the insights they can get from their increasing volumes of data, and the speed at which they can deliver it to its users. Speed has become a competitive advantage, getting intelligence and value from your data faster—ideally in real-time, as it flies by.
This requires that you design your application with your data at its center, that you take ownership of your data, and not delegate it and its management to a third-party service—since if you keep your data “somewhere else” and always having to go and fetch it whenever you need it, you will most often be too slow. Additionally, you will be at the mercy of the availability of the third-party service—its SLA becomes your SLA.
As we have discussed, managing state in a distributed system, correctly and reliably at scale, is very hard, and contention on mutable state is the single biggest scalability killer.
How do we resolve this perceived conflict between building data-centric data-driven applications and the need for low latency, scale, and availability?
Stateful Cloud Native Applications Require a Reactive Architecture
Reactive Architecture is a set of design practices and architectural principles that application designers and developers apply to ensure that a distributed system can be responsive (serve data to its users in a timely fashion), resilient (always available, self-healing), and elastic (scale both up and down on demand) by ensuring efficient management of distributed state and communication.
The best way to scale an application is often to minimize the state that is required to be strongly consistent and coordinated across multiple services. But once we minimize this data set to its core essence, how can we effectively (and correctly) manage it in a Cloud Native environment?
Reactive Architecture has intelligent forms of data replication, coordination, and persistence, and uses different flavors of Event-Driven Architectures, such as CQRS and Event Sourcing to treat a distributed system’s state similar to how databases manage their transaction logs—but with co-location of state and behavior, and each log sharded per service. In this design, you have, not just the snapshot of the current state, but the full history of all states, all the events reaching up to the current state—allowing you to replay the events on failure for self-healing, perform auditing, debugging, and replication, and letting services subscribe to each other’s state changes for asynchronous updates.
Cloud Native applications built using Reactive Architecture have scientific roots in the Actor Model which—with its reactive, autonomous, and self-healing components called “Actors”. Actors provide co-location of state and processing and are the perfect foundation for managing distributed state in the cloud. Many of these applications demonstrate remarkable properties including tens of thousands of TPS, predictable and eventually consistent behavior while switching between unreliable networks, clean bridging of logic and state, and resilience across different networks.
Other forms of distributed cloud systems, such as Function-as-a-Service (FaaS) or services wired up with a Service Mesh like Istio, can address some of the Cloud Native application requirements. However, when used to build stateful, data-centric, and event-driven application developers can quickly run into limits with those systems, or even worse, design themselves into corners. For example, while systems that use database scaling or replicated cache systems scale initially, they eventually hit a ceiling or can experience resilience issues as the load and/or data volumes increase beyond it’s designed capacity.
Reactive Architecture has a demonstrated ability to address all Cloud Native application requirements , with use-cases ranging from stateless to stateful, gracefully at scale, while offering developers insurance against changing application requirements.
Cloud Native applications are on the rise but if you enter this new world carrying old tools, habits, and practices not designed for the cloud, you are up for a challenge. It will make it very hard for the cloud to live up to its promises of fast turn-around time, cost efficiency, great scalability, and availability. In particular, if you embark on building inherently stateful, data-centric, event-driven applications. Here, Reactive Architecture can provide the bedrock we all need to tackle the challenges we are facing when moving to the cloud.
For further reading on Reactive Architecture, see my mini-book: “Reactive Microsystems: The Evolution of Microservices at Scale”.