In our recent announcement of version 2.1.0 of Akka gRPC, the Akka team was also pleased to see very positive results from a community benchmark that measures performance and latency of over 30 gRPC libraries. Akka gRPC (named “Scala_Akka” in the benchmark) went from 29th place to 1st in just three months, delivering an impressive 1231% increase in processing throughput (req/s) and a 93% reduction in average latency per request.
So what is Akka gRPC, and how did we manage to make it the fastest solution tested? We sat down with Johannes Rudolph (@virtualvoid), Senior Engineer on the Akka team at Lightbend, to learn more about this story...
Akka gRPC is our OSS gRPC library that builds on top of Akka and Akka HTTP. A few years ago we started looking into gRPC over other protocols for microservice communication because of its well-defined interface and solid tooling for different programming languages–i.e. not just JVM languages.
Akka Serverless utilizes gRPC throughout the stack–user services and entities are declared in a protobuf definition and can be accessed via gRPC. Likewise the internal communication of the user code with the infrastructure is handled via gRPC.
Streaming is a first-class concept for gRPC service endpoints which fits our stack well–check out this documentation to learn more about when gRPC is a good alternative to a traditional HTTP API for communication between microservices.
In May 2021, we were preparing the Akka Serverless Beta launch and worked on optimizations in different parts of the system. When testing basic RPC calls to establish a baseline, one thing that turned up in profiling was gRPC communication in our infrastructure. At about the same time, the previous run of grpc_bench was published and found our attention after some public commentary about the relatively poor performance of Scala_Akka.
The results from grpc_bench are generated using ghz, a Go-based gRPC benchmarking tool, as the underlying client. It runs a simple RPC scenario without streaming–1000 requests concurrently, spread over 50 persistent connections. It measures the throughput and latency under various conditions, including performance over 1-3 CPUs.
We had not optimized Akka gRPC in particular before–however, we decided to spend several weeks going through our stack to find and eliminate bottlenecks. We made optimizations in Akka itself, Akka HTTP, and Akka gRPC to realize all the potential gains. The full set of improvements is available starting with Akka 2.6.16, Akka HTTP 10.2.5, and Akka gRPC 2.1.0.
|May 2021 - 2 CPU server||August 2021 - 2 CPU server|
|Ranking: 29 out of 33||Ranking: 1 out of 33|
|Requests per second: 6117||Requests per second: 81435|
|Average latency: 163 milliseconds||Average latency: 12 milliseconds|
These metrics result in the following:
In our opinion, this test measures the baseline performance in the simplest case of gRPC usage well. In a modern microservice architecture, and in many cases serverless architectures as well, most requests are small and do not require streaming. Also, the set of machines talking to each other are relatively stable, so that persistent connections are common.
One potential issue is that in our tests the benchmarking tool, ghz, can saturate the testing machine with its own resource usage. We found that under top performance (around 60-80k RPS) the benchmarking tool itself started to become the bottleneck. As always, take note of the order of magnitude but don’t interpret too much into the concrete numbers of the benchmark result.
If you want to test this on your own machine, grpc_bench uses a Docker-based workflow that is orchestrated using a few scripts. It is quite simple to build and execute the benchmarks you care about on your own–see its README for more information.
We noticed that our gRPC usage in Akka Serverless (and also in general) are more often of the simpler request/response kind than actually using the streaming capabilities of gRPC. However, the HTTP/2 implementation of Akka HTTP was built around the streaming capabilities. A stream has a significant cost when it is initially set up which limits the achievable throughput for one-off requests.
So we build a fast path for simple requests which required changes throughout the stack:
Another significant improvement was found by avoiding parsing of HTTP headers when they have been seen before on the same connection. For a list of these and all the other, smaller improvements, you can see the release notes.
All users of gRPC (and Akka Serverless) will benefit from the optimizations but simple request/response calls especially will now have a much smaller footprint and leave more processing power to the application. This is important in particular for applications that require processing times shorter than 1ms or need to scale to throughputs of more than thousands of requests per second per core.
Improvements made in Akka HTTP and Akka itself, are not specific to gRPC, so other uses of HTTP/2 or the TCP streams in Akka Streams will also benefit from the improvements. We look forward to seeing how this works out for our users, so please join the discussion!