About six months ago, I started to see some really great blog posts come from a person named Colin Breck, who I’ve since had the pleasure of getting to know better. Colin is an experienced Akka developer who recently spoke at QCon London about using quality views to combat technical debt, and followed up with a podcast with InfoQ.
Colin has spent a lot of time writing on http://blog.colinbreck.com to help developers understand the different modules in the overall Akka toolkit, with a special focus on how to use Akka Actors and Akka Streams in conjunction for the greatest benefit. So I wanted to invite Colin to this Lightbend Podcast to go into more detail on this particular aspect of building resilient, distributed, back-pressured streaming applications with Akka. Enjoy!
Oliver White (OW): Hello dear listeners and welcome to today's Lightbend Podcast with special guest Colin Breck, a longtime Akka user that has been creating some excellent technical content on his personal blog lately. Today we're here to talk about Akka Streams and Akka Actors, two very important modules for building resilient streaming, distributed systems with the overall Akka toolkit.
Though these modules can be used for different things, Colin is here to describe how actors and streams solve complimentary issues, describing some of the basic patterns for interfacing both streams with actors, and actors with streams.
Colin, it's wonderful to welcome you to our podcast program and thanks for taking the time to sit down with me.
Colin Breck (CB): Thanks for having me.
OW: So, before we get down to deep Akka internals, why don't you tell us a little bit about yourself. What's your background? How did you get started with Akka? What are you doing these days?
CB: I studied engineering at university and then worked as a software developer professionally, mainly working on systems for near real-time monitoring and control of industrial processes. So, I’ve been working on streaming-data systems for close to two decades, I guess, working on streaming data since before it was cool to say you worked on streaming data.
Early in my career, I worked on a lot of technologies like reliable data collection from sensors and control systems, then moved on to some infrastructure software around time-series databases and publish-subscribe messaging for streaming data. And then from there I started working on cloud systems that really mirror those technologies for a lot of IoT applications. And I got interested in the scalability, security, reliability, and performance of those types of systems.
OW: So you've come from the days when it wasn't even cool to say streaming, and it sounds like you made a big leap right into the Cloud with IoT devices and so on. That's something that not a lot of people seem to be able to do.
CB: I guess. It's definitely given me an appreciation for some of the evolution of the technologies and some of the benefits that Akka brings. You know, when I was building a lot of these streaming technologies, you know, five or ten years ago, having to deal with memory management, and concurrency, and back pressure, and all these things on our own, and when you can open up a toolkit that provides a lot of those things for you, it's pretty nice.
OW: So when was it that you actually discovered Akka? And what was it that kind of wowed you with Akka compared to anything else out there that you were experienced with?
CB: It's kind of by accident, actually. It was at a former employer of mine, two colleagues were keen on using Akka. And so, I just kind of went along for the ride. I had never even done any JVM development before that. Actually, it was all C++ and .NET. So even the JVM stack was kind of new to me. I was eager in getting involved in actor systems for the modeling of distributed systems, especially for IoT, and had a little bit of exposure to Microsoft's Orleans.
And I was, also, really interested in the Reactive Streams work and had some very minor exposure to the Reactive Extensions for .NET. And I was also eager in learning a functional programming language. So when my colleagues were kind of keen on Akka and using Scala that's, I guess, that's what piqued my interest.
I definitely wanted to embrace the Reactive Streams model, definitely wanted to embrace the actor model, and functional programing. But we definitely didn't do any, like, bake-off of technologies. I would say more than anything it just looked like Akka fit the bill.
OW: So, you mentioned Microsoft Orleans. Were there any other tools that you were trying to accomplish this sort of thing with before you settled on Akka?
CB: No, I definitely wanted to embrace the Reactive Streams model, definitely wanted to embrace the actor model, and functional programing. But we definitely didn't do any, like, bake-off of technologies. I would say more than anything it just looked like Akka fit the bill. And we had confidence that Akka was going to continue to grow in the right direction as our needs evolved. And I think that the fact that it had commercial support from Lightbend behind it as well, probably led to a level of comfort with the technology.
OW: Oh, well, thanks for the plug! :) So, Akka is a toolkit. There is, I believe, nine separate modules right now. And some of them even have submodules. Which Akka modules would you say that you feel the most comfortable with and that you've used with the most success professionally?
CB: It's a broad toolkit, I will say that. I still feel like there's parts of Akka that I haven't explored and I want to get more familiar with. Definitely familiar with actors for modeling the domain, supervision, fault tolerance. Patterns of circuit breaking and backoff supervision. These kinds of things.
I’ve used a lot of Akka HTTP, both for traditional JSON over HTTP servers, but also extensive use of WebSockets, both on the client side and on the server side from Akka HTTP.
And then really extensive use of Akka Streams API. In a lot of the streaming systems that I've worked on through my career, that's just such a natural fit for modeling that domain. I’ve made really extensive use of that toolkit.
More recently, I'm interested in exploring Event Sourcing models mixed with Akka Persistence and then distributing workloads in clusters through Cluster Sharding and Cluster Singletons. But I don't have any experience running clustered workloads in production environments. But they're topics that I've been exploring recently on my blog.
OW: Yeah. I actually was really surprised and pleased to see all of the great Akka content on your blog. You've gotten quite prolific with it. By the way, your blog is http://blog.colinbreck.com. Is that correct?
CB: That's correct.
OW: All right. So I think the thing that appealed most to me was that your focus was on Akka Streams and Akka Actors, and how to use them together for the greatest benefit rather than trying to figure out if you should use one or the other and having to make a decision. And that was really appealing because this is a question that we're asked often. So, what can you tell us about Akka Streams and Akka Actors, and kind of the strengths of each of these, and also where you see overlap and benefit?
CB: Yeah, the blog articles that I've written actually come from, I guess, trying to explain my own learning in the area, and even my interaction with, you know, friends and colleagues around trying to understand when to use actors and when to use streams. I've seen a few instances where people embrace the actor model and maybe even came to Akka before the Akka Streams API was quite mature; it's one of the newer modules in terms of the entire toolkit.
So, I've seen people embrace the actor model and even try and do streaming systems with it and then start to run into some of the typical problems of traditional distributed-systems programming in terms of, batching writes, and dealing with concurrency, and these kinds of things. Just like Credit Karma, actually, who I think has done a [webinar] with you and they gave a talk at Data by the Bay. They have a nice overview of the problems you run into if you try and build a streaming system with only actors. That's kind of a good summary of the problems.
And then I've also seen people that embrace, and I had this realization myself, you embrace the Akka Streams API and it fits your streaming-data workloads so well you kind of question, well why do I need actors anymore? Maybe this is the only part of the toolkit that I need? But then I think as you keep living with those two technologies you start to learn that they're quite complimentary.
OW: So you mentioned that Akka Streams is relatively new and I just want to point out that Akka Streams was really launched when the Reactive Streams Initiative had a 1.0.0 release. Akka Streams, our Slick database, you mentioned also, reactive extensions for Microsoft and there is a bunch of other tools that all kind of came together as part of the original Reactive Streams Initiative. So it's been not that long but so much has happened since then.
CB: True. And this is one of the really differentiating features of Akka, in my opinion, is that a lot of people are, say, familiar with the Reactive Streams concept, and even the lower level interfaces like onNext and onSubscribe, and they go out to implement systems at this level. This was actually my first encounter with Reactive Streams. And at that level you're dealing with all the low level semantics of Reactive Streams. You're still having to deal with concurrency issues.
What I think the Akka Streams API really got right is it's an end-user API, which, if you know, if you read some of the comments like those that Viktor Klang has had and others, on the Reactive Streams Protocol, it wasn't meant to be an end-user protocol.
So when you raise it to the level of the Akka Streams API you're actually solving these patterns that you find in streaming applications, like limiting concurrency, batching messages, throttling, these kind of high level patterns that you encounter over and over again. I think that's the real power of Akka Streams API.
OW: Yeah. Thanks for mentioning that. So as you were saying it, it seems that some people think it's about making a choice between using actors or streams. Is this kind of a cliché, where it's one plus one equals three? Or how would you describe the best times and cases to use either actors or streams, and how to use them in the right combination?
CB: So, I think one plus one equals three is a good description.
Actors definitely makes sense for encapsulating mutable state, for handling fault tolerance, for distributing workloads in a cluster, for providing location transparency so that you can just query an actor, regardless of where it's running in a cluster. Even querying an actor to fetch the current state is something that you can't do with a stream; you can't send a query to a stream. So these are the strengths of actors.
In terms of streams, it's really modeling streaming workflows and providing that asynchronous backpressure so that you have bounded resource-usage. Streams also have benefits around young-generation garbage collection, and Akka Streams, as I've already said, models the domain so that you don't have to go and implement a lot of these high-level concepts in terms of throttling and batching. The concepts themselves are simple but once you start implementing them they can get fairly hairy. So the fact that you can just take a flow stage that does all you need is really powerful.
And I think part of the challenge is learning how to mix the two. Because as soon as you start interfacing streams and actors you're not necessarily in the backpressured world of Reactive Streams anymore unless you do it fairly carefully. So that's actually a blog article I wrote a couple months ago, was on patterns for interfacing streams and actors, and how to do it safely.
OW: Now slightly off subject, Akka Typed is getting a lot attention and momentum these days. Have you had a chance to look into Akka Typed and see if this is something that can kind of help this intersection between actors and streams go a little bit more smoothly?
CB: Ah, I briefly looked at Typed. I understand the idea. I have not thought about it in the context of interfacing actors and streams.
OW: I think the biggest win for most people is that actors are now typesafe with Akka Typed. So that was something that people have been kind of missing for a while.
CB: I think the biggest challenge with streams and actors, and the interfacing, is really the backpressure element. If you're going to send off asynchronous messages to actors, you're not in the backpressure world anymore. That's one of the biggest challenges. And then even managing life cycle of streams relative to actors, there are some subtleties in getting that right, especially if you want to distribute workloads in a cluster.
I think the biggest challenge with streams and actors, and the interfacing, is really the backpressure element. If you're going to send off asynchronous messages to actors, you're not in the backpressure world anymore.
OW: Yeah, absolutely. You recently mentioned using Akka Streams for batch processing and so on. And, you know, our upcoming Fast Data Platform is really focused on allowing people to choose different stream processing frameworks for the right job. But Akka Streams seems to provide, at least in my opinion, this excellent layer of glue between your monitoring and your machine learning components and, you know, your microservices architectures reaching out to the outer world, and your data ingestion pipeline and everything. So, I guess, my question is, do you see Akka Streams finding a nice place in the so-called fast data world?
CB: Yeah. I think so. I think this is something I'm sure lots of people are questioning. It's a question I have myself is, you know, you look at the Fast Data Platform and you've got Apache Flink, you've got Apache Spark, you've got Kafka Streams, and then there is also Akka Streams. So when do I use the right tool for the right job? I think that's still a bit of an open question, just with how much evolution there is in streaming-data platforms.
Whenever you're writing custom code and you have a streaming workload, I think the Akka Streams API fits really, really naturally. And then, I think, if you're writing, you know, distributed systems that need to be resilient and you want to persist state, maybe embrace the Event Sourcing model, and you want to just stick with actors and streams and not leave the Akka toolkit, I think that's when Akka Streams is a really great choice.
Of course, even in Akka HTTP, the WebSockets are implemented using the Akka Streams API.
OW: We're doing a lot of “drinking of our own champagne”, so to speak, at Lightbend. So Akka HTTP is now the default HTTP/2 server for Play Framework. And as you know, Akka Streams, Akka Persistence, and Akka Cluster are part of our Lagom Framework. So that people who aren't familiar yet with actors and do want to harness the power of actors and streams can actually code in their normal IDE and build systems of microservices. Have you had a chance to play with Lagom by any chance?
CB: No. It's on my very long to-do list. I wish I had more time.
OW: Don't we all! Earlier, I believe, you mentioned working with Akka Persistence, Akka Distributed Data, and kind of getting into the CQRS world. Could you elaborate a little bit more on other Akka modules that you're working with these days?
CB: I'm writing a four-part blog series on the interface between streams and actors, and one of the things that I'm exploring in that is, simulating a bunch of IoT devices that connect via WebSocket to some service. Of course, the WebSocket is a streaming connection using the Akka Streams API. And how would I distribute those workloads in a cluster to maybe simulate load for testing? So for that I'm using Distributed Data and Cluster-Sharded actors in order to distribute actors in a cluster and then each actor is hosting a stream, representing a single IoT device.
And then I want to extend that into -- showing the relationship between -- Event Sourcing is also kind of a streaming model. Of course, Akka Persistence with actors, naturally embraces that model.
And then if you look at Akka Persistence Query, you can actually stream that back through the Akka Streams API. So it's almost a blurring of the lines in terms of actors and streams, once you've mixed Persistence and Persistence Query.
OW: Would you mind describing to our listeners a little bit more about Akka cluster sharding? Because I think that this is something that is often unheard of and probably misunderstood. Do you have a quick version of what is cluster sharding and how does that work?
CB: Often you run into a situation where you want one actor running in a cluster that, I don't know, represents a tenant, or represents a certain workload, or maybe a certain transaction in an e-commerce scenario. And I think the initial reaction is to go to the Cluster Singleton actor.
[Akka] cluster will take care of running one actor somewhere in the cluster that represents that entity. And, of course, because you have the location transparent addressing, you don't really need to care where it's running.
But the documentation discourages running millions and millions of cluster singletons. And the Cluster Sharding actor is actually often more appropriate in that case. So the cluster will take care of running one actor somewhere in the cluster that represents that entity. And, of course, because you have the location transparent addressing, you don't really need to care where it's running. You can still just send a message to it like you would a normal actor.
And then if you lose a note in the cluster, or say you're rolling through the cluster to do a rolling upgrade, or maybe you're scaling your workload, so you're adding notes to the cluster, the cluster will take care of rebalancing the Cluster Shards. And in doing that may move actors around in the cluster to balance those workloads.
So depending on your workload you may need to mix Persistence with that if you need to rehydrate state when the actors moved around. Or if you don't have any persistent state, which is kind of the example I've been using in my blog post, it's really just an ID. And that's all that's necessary when the actors restart it, it can start running again without any need for Persistence.
OW: You know I think that's maybe one of the best descriptions I've heard. Thank you for sharing that and making it easy to understand. Let me ask you about network partitions or split-brain scenarios, as they're called. Is this something that you have to deal with in the systems that you work with?
CB: I've only dealt with clustering in playing around with it for my blog articles. And in doing that, of course, I'm using the automatic downing, which has the explicit warning, "Never use this in production."
I don't have experience dealing with split-brain scenarios in production. Professionally, it's something that I want to get involved in. And some of the things I'm working on are going to really need clustering support. So it's definitely an area that I'm interested in. And I know Lightbend has a commercial solution in terms of the split-brain resolver to deal with some of the split-brain scenarios.
OW: Yeah, yeah. And I will shamelessly plug the fact that we are releasing a white paper very soon [editor’s note: now available!] on split-brain resolution and what network partitions mean and so on. So I'm excited for that. I'll make sure you get a copy, Colin.
CB: That would be great.
OW: So last question for you. What would you recommend to people who have never heard of Akka before and they're looking to start working with distributed systems and not have to use the kind of old Monolithic thread per request model when they're trying to get things up and running in the Cloud and distributed, and so on?
CB: Akka is -- the breadth of the tool kit is wide. There is a lot there. If you're using it, you're definitely building on the shoulders of giants. There is a lot of years and years of pretty tricky distributed systems problems that have been taken care of for you under the hood there.
I think that the toolkit can be a little intimidating to people coming to it. I suggest starting small. Don't necessarily try and run, you know, wild clustered workloads and introduce every different part of the toolkit immediately. I think I would start with modeling more than anything, and understanding how you're going to break your domain down and model it with actors, or model it with streams, or some interface of the two. And from there, as your problem grows, start embracing other pieces of the toolkit.
It's been my experience, that as you run into different problems, like, okay now we need scalability, we need to scale across nodes, or we need to have backoff supervision, or we need to have circuit breaking, you can just start pulling in pieces of the Akka toolkit. And you're not going to end up re-writing your code because the code you've encapsulated in the modeling of the domain through your actors is quite composable with all these new pieces.
I think the documentation for Akka is really fantastic. The community is fantastic. But I'd say start small, don't get too ambitious, and let it grow with you. And there is definitely more and more, like when I started with Akka, I think it was probably a little harder to get a foothold.
There's starting to be really some excellent resources in the community. Some good books. Konrad Malawski's "Akka and the Zen of Reactive System Design talk, really simplifies some things in terms of patterns that are encouraged. There are some talks out there now on some anti-patterns. I think that makes getting started a lot easier.
OW: Yeah. So it sounds like you're making a bit of a pitch for learning about domain driven design from, you know, someone like Vaughn Vernon, who's written some excellent content on that and also did a wonderful Lightbend webinar with us earlier this year.
CB: Yeah. I think that's a good place to start.
OW: So from domain driven design to building your first actor. And start small and let it get solid and very bullet proof at the, you know, individual actor level before you start building clusters to rule the world, right?
CB: Yeah. I’d say the same thing with streams. There are a lot of really, really powerful aspects of the streaming API. So start with a simple stream, maybe move data from A to B, transform it, throttle it, something like that, before you get into building really complex graphs or try to build custom flow stages, or these kind of things.
OW: I think that's good advice. Well, Colin, it's wonderful speaking with you. It's personally very exciting for me to see such momentum with Akka Streams and other Akka modules. I hope that our listeners have learned as much as I have today and feel more confident in exploring Akka's different modules in their own projects. Thanks once again for joining us. I can't wait to connect with you soon and see more amazing content coming on your blog.
CB: Thanks for having me.
To our listeners, feel free to check out Colin's articles on his blog at blog.colinbreck.com. I also recommend that you get connected with, as Colin called it, the vibrant Akka community, which you can get involved with at Akka.io and follow @akkateam on Twitter. If your team is serious about using Akka in production, I invite you to check out Lightbend Reactive Platform, which provides four specialized commercial modules on top of the open source Akka core, including a Split Brain Resolver for dealing with network partitions in about one minute for a thousand nodes, a thread starvation detector, Akka configuration file checker, and a clustered diagnostics recorder.
You can read all about that on our new and growing Lightbend tech hub at developer.lightbend.com. Or, simply go to lightbend.com/contact to request a short 20-minute introductory call with someone on our team: