Last week we hosted a webinar, Introducing Reactive Streams, with Roland Kuhn, Akka tech lead. Over 900 people registered for the event, and we had some really great questions submitted for the Q&A. Konrad Malawski, software engineer at Typesafe, responded to folks in real time, but we thought our readers might find some of the conversations interesting, so we’re sharing on our blog!
The questions answered here have in some cases been summarized. See below and feel free to submit your own questions after watching the recording and checking out the slides.
Question: Will a wire protocol be defined to facilitate interoperability between different language implementations?
Response: Let us work out the in-jvm for now :-) In general there is both interest and the possibility to generalise these concepts over the network / languages (outside of the jvm even). Please refer to discussions on the reactive-streams repository for details (there’s a lot happening there at the moment).
Q: Are there any known possible limitations regarding serializing/deserializing the stream contents? (data formats,plugable serializers, other ?)
R: Nope, it's completely oblivious to what you're transferring using them. It's a Flow[T], where T can be anything. Once we describe how to transfer streams outside of the JVM we will have to work on the serialization aspects as well.
Q: The primordial source of publishing is often beyond our control, comes from external sources. So we have not really solved the problem of bounded buffering: the buffer for the externally-arriving elements that have not yet been demanded can't be bounded.
R: Yes, if integrating with non-reactive sources you will have to buffer and drop like you usually would, we aim to help between systems that understand this protocol—and have a few projects on board together with us on this quest :-)
Q: Could you give a brief distinction between Akka stream and Vert.x stream? Is there a typical example where you would favor Akka streams over Vert.x streams (except when you're a passionate Akka developer)
R: AFAIR Vert.x streams do not provide backpressure mechanisms, which is why the vert.x team is part of the Reactive Streams project. This is our "killer feature" if you will. Reactive streams are able to adjust produce rates thanks to the built in backpressure feature. The goal is to not kill consumers when the producer is faster than the consumer.
Q: Is it the case that a single slow consumer is still going to cause a need for a message dropping policy when broadcasting events / zipping demand? For building blocks to evolve on top of this, do you think consumer type semantics (message mergability, etc) need to be propagated alongside demand, possibly remotely?
R: That's a bit implementation dependent, in general there's a few strategies you can do for "I have multiple subscribers, and one of them is slow". We don't have much implemented around this (yet!), but we're thinking a lot about it. When a consumer cannot keep up, but the stream must flow, then you need to either terminate that consumer or describe how the excess data are going to be handled: dropped, merged, aggregated, etc. This will depend on the requirements of the use-case.
Q: Is Akka-stream an alternative to other stream processor solutions like Storm or Samza?
R: The general focus is a bit different I believe - Akka-streams can be used very "low level" - you get a TCP stream / HTTP stream and from there on you move in to your domain objects for example, and end-to-end you can stay async. Storm or Scalding are specifically geared towards "data jobs". On the other hand, Akka Streams could be a building block for systems like these. We'll see how it goes :)
Q: How much overlap exists between Akka-streams and Spark? Is Akka-streams more generalized and lower level than Spark?
R: Akka Streams are at a lower level than Spark, we work on passing data around while Spark is more concerned with which data to process where and which transformations to apply. It should hypothetically be possible to implement Spark in terms of Akka Streams (we are not aware of such plans right now).
Q: Has this model been applied to message based systems (e.g. AMQP/JMS)?
R: Not yet as far as I'm aware of, but it certainly could be (to these that you mention); any endpoint can be made into an reactive-stream producer or consumer (it might have to do buffering inside though).
Q: Can AngularJS act as subscriber for Akka streams?
R: Not yet, but would be awesome - via websockets from serverland into userland... very exciting stuff coming up :-)
Q: Could Reactive Streams fit for doing analytics?. For example, to be a tool for running ETL procedures?
R: Yes, that's one of the use-cases we have in mind - real time analytics etc.
Q: Do you intend to create a formal language model to process streams in real time?
R: Not at this point. We are concerned more with the low-level mechanics, a formal language could be implemented on top of it.
Q: Can we have these streams on one side and another existing solution (e.g. flume) on another side?
R: You can create Publishers or Subscribers for any data source or sink. Once e.g. flume is exposed like this, you can integrate it seamlessly.
Q: Is there a difference between the Reactive Manifesto and Reactive functional programming?
R: Yes and no: "reactive functional" is more like "reactive AND functional". Functional programming is not a requirement for "reactive" when you look at the manifesto (which is mostly about scalability and stability). Functional programming does help quite a bit though - so these definitions are "orthogonal" if you will.
Q: Will Reactive Streams make Scala streams deprecated or will it always be separate?
R: Scala’s Stream type models are something different: a lazily computed collection. There is no overlap between these two.
Q: What if there are no subscribers yet data is still flowing, how can one decide whether to discard or store data streams?
R: That’s use-case dependent. In general, you have two options - backpressure the producer (if you can), or simply drop data until you get subscribers. You can also keep buffering if you can’t backpressure the producer (producer is a legacy system for example) until you get subscribers, but eventually the buffer will run full and you will need to drop things.
Q: What about conveying backpressure for streams that do not fit this model? e.g. calling external HTTP services that when it is running out of resources begins to refuse new connections. How would that work when the backpressure is implicit? That is, there's nothing responding to demand messages.
R: Refusing connections is very explicit, and the backpressure in this scenario would also be: if the call-out to the external HTTP service does not work, then your original stream will receive errors—either directly or via timeouts. If you zip these together with the requests then backpressure will work correctly so that the external service slows down your local processing. You can avoid that by adding a circuit breaker in front of the service to cut down on timeouts during extended outages.
Q: Can/will/should the Reactive Stream DSL described become the main or preferred way of implementing solutions with Akka Actors? Or Actors have other use cases which Reactive Streams DSL/API cannot be applied?
R: Actors and Streams are different abstractions, living on different levels. Streams can convey data in one fixed direction whereas Actors can talk with arbitrary other Actors during their lifetime, so while Streams can be implemented on top of Actors, neither will replace the other.
Q: Is it included in latest akka release ? v2.3.2
R: Yes - you can play around with them using this activator template: http://typesafe.com/activator/template/akka-stream-scala. If you want the plain dependency, the ModuleID is: "com.typesafe.akka" % "akka-stream-experimental_2.10" % "0.2" (as of writing this blog post).
Q: Is there a Scala 2.11 release for Akka Streams?
R: We'll do that ASAP! :-) (we will need to wait for a small Scala update first, though).
Q: URL for documentation?
R: This is all very early stage—for the spec you can read reactive-streams.org and peruse the reactive-streams github repository. For Akka Streams there is just the source code at this point.
Q: What kind of technology does Akka Stream support, does it support Web-Services, or http request?
R: Akka Streams themselves are building blocks from which higher-level services can be built. Akka HTTP will offer stream-based APIs for communicating with web services or via WebSockets.
Q: Are there plans for database drivers?
R: Not from us directly - we’re hoping that the community will bootstrap some projects like this. :-)
Q: On the file example, will there be out of the box support for NIO file (on disk) reading piped into streams?
R: Yes, there will be.
Q: Will this replace the WS client in the Play Framework?
R: Play will eventually (Play 3) be based upon the upcoming Akka HTTP module, at which point it will also offer stream-based APIs in addition to the current ones that are based on Iteratees.
Q: I’m very interested in FlowMaterializer and how we can express distribution
R: This interface has been put in place to plug in different materialization strategies, including spawning parts of the pipeline—part of the actors—on different network nodes. The Flow DSL is completely independent from the implementation used by the materializer, which allows all kinds of transformations and optimizations to be done.
Q: Is the source code for FlowMaterializer available publicly at present?
R: Here it is: https://github.com/akka/akka/blob/release-2.3/akka-stream/src/main/scala/akka/stream/FlowMaterializer.scala Please remember that the akka-stream module is experimental, and it may change without prior notice.
Q: What happens with errors? There's supervision for actors, deathwatch, and so forth... how does all that fit into this model?
R: The Reactive Streams interfaces have the onError callback for tearing down the processing pipeline in case of failure, which means that something unforeseen went wrong. In case of Akka Streams we can leave the decision of whether or not a thrown exception is “fatal” to the supervisor of the failing stream processing actor. Or a processing pipeline is created as child actors of a supervisor which can just restart the processing, reconnecting to the data source if necessary.
Q: Would Spray support Akka-Stream as well? i.e. Can Spray REST EPs be Streamed?
R: Spray is becoming Akka HTTP, so yes, we're currently migrating spray to streams. Though it is a very early impl.
Q: When we can expect release of Akka Http (with Java 8)?
R: "Soon" :-) I don’t think we have exact dates set in stone yet.
Thanks to all who attended the session! We hope to see you on next month’s webinar.