With Scala Days 2015 San Francisco just around the corner (and only 15% of tickets left), it has got me thinking quite a bit about how much the ecosystem has expanded since I first became involved with the conference in 2011.
The rapidly-growing Scala community has evolved from what was largely a very academic and research-oriented crew, with some early champions like Twitter and Foursquare, to a language that’s become a standard for enterprises, start-ups and universities alike.
But even as companies and individuals use Scala to build their own new ideas, they also utilize other excellent tools like Play Framework, Akka, Apache Spark and Kafka...which are not only some of the hottest tools and projects on the market right now, but also intentionally built in Scala (for many reasons…)
So, to pay homage to these excellent technologies created using Scala, we thought we’d highlight a few favorites. Feel free to add more in the comments section, and perhaps we can do a round II of this blog. :)
Although Hadoop has used MapReduce as the officially-supported Big Data engine for writing all compute jobs, its inability to handle event stream processing, a difficult API and recent trends in consumer behavior have driven interest in alternatives. Scala has taken over the world of “Fast” Data, which is what some are calling the next wave of computation engines that rely more on the speed of data processing rather than the size of the batch, and the ability to process event streams in real-time. Several prominent examples of that movement are Spark, Scalding, Kafka, and Samza, which are rapidly gaining awareness and use.
The Hadoop community has realized over the last several years that a complete replacement for MapReduce is needed. While MR has served the community well, it’s a decade old and the limitations mentioned above must be solved. In late 2013, Cloudera, the largest Hadoop vendor, officially embraced Apache Spark as the replacement. Most of the other Hadoop vendors have since followed.
Simply put, Spark is a fast and general-purpose engine for large-scale data processing. Developed in the AMPLab at UC Berkeley, it offers an alternative to Hadoop's two-stage MapReduce paradigm. Spark's fine-grained operators, in-memory caching of intermediate data, and dataflow optimizations improve performance up to 100 times faster for certain applications. We’re huge Spark fans at Typesafe and offer Spark support and training for our customers. Spark streaming is very popular, too, meeting the needs for most event streaming scenarios.
Functional programming is a natural fit for Big Data because the Mathematics orientation fits typical analysis scenarios. The need for a Hadoop API that provides common functional operations drove Twitter to write Scalding, a Scala API that sits on top of Cascading, which provides a higher-level Java API on top of MapReduce that exposes useful operations and a dataflow model of processing. Scalding provides the full benefits of Scala syntax and functional operations.
Apache Kafka, a publish-subscribe messaging system rethought as a distributed commit log, is also written in Scala and really highlights the language’s power. Built by Linkedin, it is at the center of their infrastructure, handling hundreds of megabytes of read-write traffic per second from thousands of clients.
Yahoo has built and open-sourced a Kafka Manager console used by many of their teams, including the Media Analytics team. It is also written in Scala, with the web console built using the Play Framework. Behind the scenes, the console interacts with an actor based, in-memory model built with Akka and Apache Curator.
Finagle highlights probably Scala’s best use case, constructing services with high scalability through concurrency and optimal use of system resources. It was built in Scala by our friends at Twitter, and provides both Scala and Java idiomatic APIs.
Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. It implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle’s code is protocol agnostic, simplifying the implementation of new protocols.
Speaking of high concurrency tools written in Scala, Akka is an extremely fast, extremely concurrent framework for building distributed applications. Akka takes care of a lot of low-level IO operations and code that have historically caused web app developers to throw up their hands and say “just throw more boxes at it.” Akka was written by Typesafe co-founder, CTO and jazz fanatic Jonas Bonér, and is an integral part of the Typesafe Reactive Platform.
Similar to Finagle, Akka is written in Scala and provides both Scala and Java APIs.
ADAM is a genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. As whole genome files tend to be very large, ADAM helps crunch petabytes of population data needed for fast analysis, which helps address the life-or-death scenarios that experts in the field often face.
For chess geeks out there, Lichess is a hobby application that enables thousands of concurrent chess games using just one server. In the words of Typesafer Will Sargent: “It’s a Play application that’s open source and uses almost every single feature that Play has. Most of the time it’s based off a snapshot build. No idea how he keeps it up, but it makes for some fascinating source code reading.”
If you are interested in learning more about more Scala-based technologies and how they fit into Reactive, microservice-based architectures, come join us at Scala Days - San Francisco, March 16-18. You’ll experience 2.5 days of technical presentations and full-day training sessions(down to the last remaining spaces)
p.s. For you attentive readers, the first 5 people who want to purchase tickets to Scala Days can request a coupon code from firstname.lastname@example.org (referencing this blog URL) to get a non-trivial discount on tickets PLUS a free Scala book of their choice with purchase.