Major Performance Improvements In Alpakka Kafka 2.0

The Alpakka project is an Reactive Streams-based OSS tool for implementing stream-aware and Reactive integration pipelines for Java and Scala. It is built on top of Akka Streams, and has been designed from the ground up to understand streaming natively and provide a DSL for reactive and stream-oriented programming, with built-in support for backpressure. Alpakka Kafka is a powerful connector for Apache Kafka, and with this latest update of Alpakka Kafka 2.0, the Alpakka group in the Akka team at Lightbend took a few minutes to answer some questions about this exciting release.  


Q: If you had to pick the most significant updates in these releases, what would they be?

The best part of this release is the performance improvements, which we were able to complete once our contributions to Apache Kafka 2.4.0 were accepted. We’ve improved performance for the consumer-flow-producer-committing use with Alpakka Kafka 2.0 and the upgrade to the Apache Kafka 2.4.0 client library contains data-fetch improvements and the new API for the common producer/committer combination. The Kafka 2.4 data-fetch improvements will make high throughput, Alpakka Kafka consuming use cases more efficient with regards to broker to consumer traffic, consumer throughput, and consumer memory usage.

When benchmarking this fix, which was contributed to Kafka 2.4 by Lightbend, we observed a 32% network traffic savings between the Broker and Consumer, and a 16% improvement in Consumer throughput!

This is such an interesting topic for us that our team member Sean Glover wrote up a much more detailed blog post titled "How Alpakka Uses Flow Control Optimizations In Apache Kafka 2.4". You can also check out the Apache Kafka 2.4 release blog post under section “KAFKA-7548: KafkaConsumer should not throw away already fetched data for paused partitions”. Additionally, we’ve also provided more options in the Alpakka Kafka Testkit for integration tests using Docker containers.

Q: What was the biggest challenge in this release?

We intended to offer a new API to allow transactional Kafka use with flows per partition (as was suggested from a contributor quite a while ago). The machinery required to keep track of transactions is quite intricate and includes an overhaul of the Alpakka Kafka partitioned sources internals.

The foundation for this work is set, but we want to have high confidence that this feature meets all transactional guarantees before making it generally available to users. We’ll get back to it soon and hope to expose this feature in the next release.

Q: What was the most significant bug fix that it feels good to have finally squashed?

We didn’t have any bugs lingering, so we took the opportunity to add extra filtering of messages when the Kafka broker rebalances, meaning that fewer messages get duplicated for the consumers.

Q: What other Lightbend technologies does this release influence?

Lightbend’s latest open source project, Cloudflow, builds heavily on Alpakka Kafka for use in Akka Streams Streamlets and benefits a lot from the multiple performance improvements we’ve accomplished.  Lagom Framework also makes use of Alpakka Kafka as an option for inter-service message passing.

Q: What does the future roadmap look like for the next release, how can folks contribute?

We intend to keep up with Kafka releases a bit closer. So users can expect a new major version of Alpakka Kafka when Kafka 2.5 becomes available. We’re always interested to improve the usability and getting started experience with Akka Streams and Alpakka, that’s a great area for new contributors to provide feedback.


For Serious Enterprises…

You can read more about these performance improvements to Kafka by Lightbend in this blog post. Want to see Alpakka, Akka Streams, and Kafka running in production? Schedule a demo with us to learn more about Akka Enhancements, Lightbend Telemetry, and Lightbend Console.

SCHEDULE A DEMO

 

Share



Comments


View All Posts or Filter By Tag


Questions?