At Barclays, our team recently built an application called Insights Engine to execute an arbitrary number N of near-arbitrary SQL-like queries and execute them in a way that can scale with increasing N.
The queries were non-trivial, each constituting 200-300 lines of SQL, and they were running over a large dataset for hours as Apache Hive scripts. Yet, we need to execute 50 queries in less than an hour for our use case.
The combination of functional programming and Spark made our pretty hard problem pretty easySam SavageVP Data Science, Barclays
While we could have achieved huge speed-ups by moving from Hive to Impala, we determined that SQL was a poor fit for this application. Our solution instead was to design a flexible, super-scalable, and highly optimized aggregation engine built in Scala and Apache Spark, with some help from functional programming.
Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.