scala spark

Eight hot technologies that were built in Scala

scala

With Scala Days 2015 San Francisco just around the corner (and only 15% of tickets left), it has got me thinking quite a bit about how much the ecosystem has expanded since I first became involved with the conference in 2011.

The rapidly-growing Scala community has evolved from what was largely a very academic and research-oriented crew, with some early champions like Twitter and Foursquare, to a language that’s become a standard for enterprises, start-ups and universities alike.

But even as companies and individuals use Scala to build their own new ideas, they also utilize other excellent tools like Play Framework, Akka, Apache Spark and Kafka...which are not only some of the hottest tools and projects on the market right now, but also intentionally built in Scala (for many reasons…)

So, to pay homage to these excellent technologies created using Scala, we thought we’d highlight a few favorites. Feel free to add more in the comments section, and perhaps we can do a round II of this blog. :)

Scala and Big Fast Data

Although Hadoop has used MapReduce as the officially-supported Big Data engine for writing all compute jobs, its inability to handle event stream processing, a difficult API and recent trends in consumer behavior have driven interest in alternatives. Scala has taken over the world of “Fast” Data, which is what some are calling the next wave of computation engines that rely more on the speed of data processing rather than the size of the batch, and the ability to process event streams in real-time. Several prominent examples of that movement are Spark, Scalding, Kafka, and Samza, which are rapidly gaining awareness and use.

Apache Spark

The Hadoop community has realized over the last several years that a complete replacement for MapReduce is needed. While MR has served the community well, it’s a decade old and the limitations mentioned above must be solved. In late 2013, Cloudera, the largest Hadoop vendor, officially embraced Apache Spark as the replacement. Most of the other Hadoop vendors have since followed.

Simply put, Spark is a fast and general-purpose engine for large-scale data processing. Developed in the AMPLab at UC Berkeley, it offers an alternative to Hadoop's two-stage MapReduce paradigm. Spark's fine-grained operators, in-memory caching of intermediate data, and dataflow optimizations improve performance up to 100 times faster for certain applications. We’re huge Spark fans at Typesafe and offer Spark support and training for our customers. Spark streaming is very popular, too, meeting the needs for most event streaming scenarios.

Scalding (by Twitter)

Functional programming is a natural fit for Big Data because the Mathematics orientation fits typical analysis scenarios. The need for a Hadoop API that provides common functional operations drove Twitter to write Scalding, a Scala API that sits on top of Cascading, which provides a higher-level Java API on top of MapReduce that exposes useful operations and a dataflow model of processing. Scalding provides the full benefits of Scala syntax and functional operations.

Apache Kafka

Apache Kafka, a publish-subscribe messaging system rethought as a distributed commit log, is also written in Scala and really highlights the language’s power. Built by Linkedin, it is at the center of their infrastructure, handling hundreds of megabytes of read-write traffic per second from thousands of clients.

Yahoo has built and open-sourced a Kafka Manager console used by many of their teams, including the Media Analytics team. It is also written in Scala, with the web console built using the Play Framework. Behind the scenes, the console interacts with an actor based, in-memory model built with Akka and Apache Curator.

Apache Samza

Apache Samza is a distributed, stream-processing framework. It uses Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.

Scala for Concurrency and Performance

Finagle (by Twitter)

Finagle highlights probably Scala’s best use case, constructing services with high scalability through concurrency and optimal use of system resources. It was built in Scala by our friends at Twitter, and provides both Scala and Java idiomatic APIs.

Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. It implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. Most of Finagle’s code is protocol agnostic, simplifying the implementation of new protocols.

Akka

Speaking of high concurrency tools written in Scala, Akka is an extremely fast, extremely concurrent framework for building distributed applications. Akka takes care of a lot of low-level IO operations and code that have historically caused web app developers to throw up their hands and say “just throw more boxes at it.” Akka was written by Typesafe co-founder, CTO and jazz fanatic Jonas Bonér, and is an integral part of the Typesafe Reactive Platform.

Similar to Finagle, Akka is written in Scala and provides both Scala and Java APIs.

Other cool stuff

ADAM

ADAM is a genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. As whole genome files tend to be very large, ADAM helps crunch petabytes of population data needed for fast analysis, which helps address the life-or-death scenarios that experts in the field often face.

Lichess

For chess geeks out there, Lichess is a hobby application that enables thousands of concurrent chess games using just one server. In the words of Typesafer Will Sargent: “It’s a Play application that’s open source and uses almost every single feature that Play has. Most of the time it’s based off a snapshot build. No idea how he keeps it up, but it makes for some fascinating source code reading.”

Discover new tools at Scala Days, March 16-18 in San Francisco

If you are interested in learning more about more Scala-based technologies and how they fit into Reactive, microservice-based architectures, come join us at Scala Days - San Francisco, March 16-18. You’ll experience 2.5 days of technical presentations and full-day training sessions(down to the last remaining spaces)

p.s. For you attentive readers, the first 5 people who want to purchase tickets to Scala Days can request a coupon code from info@scaladays.org (referencing this blog URL) to get a non-trivial discount on tickets PLUS a free Scala book of their choice with purchase.

The Total Economic Impact™
Of Lightbend Akka

139% ROI
50% to 75% faster time-to-market
20x increase in developer throughput
<6 months Akka pays for itself

Read the full report

March 5, 2015