In this Lightbend Podcast, we sit down with Renato Cavalcanti, senior engineer on the Akka team, frequent conference speaker (at least in past days), and organizer of the Belgian Scala User Group, BeScala. The topic is the recent release of Akka Projections 1.0, a new addition to the Akka toolkit.
As it’s described in the documentation, with Akka Projections you process a stream of events or records from a source to a projected model or external system. Each event is associated with an offset representing the position in the stream, and this offset is used for resuming the stream from that position when the projection is restarted.
Now, this seems simple enough, but the real power of Akka Projections is not easily understood with this description. So, Renato breaks things down a bit more about how Akka Projections works, where it fits in your project, and why it’s ideal for cloud native applications.
What follows below is a loose transcription of our conversation, which Renato generously provided.
RC: Projection is a term that became popular in the context of event sourced applications. In an event sourced system, events are persisted in a journal and used to replay and reconstruct the state of a given model. But we also want to consume those same events to generate other kinds of data. For instance, I may want to produce a query optimized read-side model or publish them on a message broker like Kafka. This is basically what a projection is: we project some existing event journal into a new representation.
RC: It's certainly very close to CQRS, we could say it's an important part of it. But I don't think it's limited to CQRS. Akka Projections is about consuming an Akka Stream Source whose elements are associated with an offset. In other words, each element flowing through it is associated with a number or timestamp. As we consume it, we can keep track of which elements we already consumed by saving the offset in an offset store of some kind. If we need to restart it for some reason, we can resume from the last processed offset. This is technically what Akka Projections is providing.
So, back to CQRS, this is the kind of infrastructure that we need when generating a read-side model (or query model). But you can also use Akka Projections to consume events from something else, like Kafka, a WebSocket or a gRPC stream. Those are not strictly speaking about CQRS, but more about consuming, stopping and resuming a stream of events.
RC: To consume Akka Persistence journals, we had already Akka Persistence Query, but we had nothing in Akka to help you track the offsets. So you had to build it yourself. You can still do it if you prefer, but if you want to do more interesting things with your time, you can use Akka Projections. But again, this is not limited to Akka Persistence Journal. It can be used with any other Akka Stream Source for which you want to track an offset and resume it without having to start from scratch.
RC: I would say that Akka Projections is the next generation of Lagom's Read-Side Processors. The concept is exactly the same. The difference is that in Lagom, you can only consume from your own journal. It's bound to the Akka Persistence Journal. Also, in Lagom, a Read-Side Processor always generates data in a database table. It's designed to write into a Cassandra or JDBC table. We would like to remove that limitation. We want to give developers a little bit more flexibility. They should be able to look at the Akka shelf, pick and arrange the components.
With Akka Projections you can implement the same as Lagom's Read-Side Processors, but you can also project on something different than a database table. You can send a message to an Actor or update an index in ElasticSearch. And you can consume events from different sources, not only the event journal.
RC: Version 1.0 provides three offset stores implementation: Cassandra, plain JDBC, and Slick. When you choose the type of projection you want, you are actually choosing the underlying offset store implementation. This has implications on the type of delivery semantics you can have.
For instance, when using JDBC or Slick, we can save the projected model on the same transaction as the offset store. This gives us exactly-once semantics. When using Cassandra we don’t have this option, instead we need to choose at-least-once or at-most-once, but not exactly-once. That’s because of the distributed nature of Cassandra. We can’t write into two tables (user model and offset store tables) on the same transaction.
RC: This is extremely important. Projections are background processes. You start it when your application starts and keep consuming events, constantly. Non-stop. And we know that soon or later, errors will pop-up. For instance, you may lose connectivity with your database or with the source of the events. Those are intermittent failures.
You need to be able to self-heal. We need to design projections for those cases. When they crash, they restart after some pre-configured delay. We must guarantee that they come back. The default behavior is to let it crash and restart it. But users can configure it to behave differently. For instance, you can keep trying to process an event up to a given threshold and then decide if it should crash and restart later or if it should skip the event.
There is a lot to learn about Akka, so here are some good resources to whet your appetite!
We also invite you to visit the Akka website for in-depth documentation, and if you feel like it's time to explore opportunities with Lightbend, you can schedule a private meeting and demo for your team: