In the financial services industry, Credit Karma is a juggernaut. With its more than 60 million members and continued presence among the top three downloaded finance apps on Google Play™ and the Apple® App Store℠ - the company arguably offers the most coveted of all platforms for financial services.
They're another example of how software--and in this case, data--is truly "eating the world."
Because not so long ago, little was truly "free" about free credit reports. The prevailing business model was to charge consumers for ongoing access to very basic credit score detail, similar to the gym membership model.
Credit Karma won its audience on an entirely different model, presenting advanced data analytics to consumers, greatly enhancing their ability to reason with and improve their credit scores. The company makes personalized recommendations to its members based on financial products that could help them build their credit or save money.
When market innovators like Credit Karma use data as the basis for disruption, they encounter new challenges as they cross thresholds into unprecedented data volumes and requirements. With more than 60 million members and many financial services offers to model analytics against on the back-end, Credit Karma’s data infrastructure is regularly dealing with 750,000 events per minute, chewing up more than 600 MB per minute.
Very early in the build-out of the company’s infrastructure, the core engineering team realized that PHP (the language it used in the earliest builds) was not going to scale to meet the demands of its data infrastructure. At a smaller scale, the company had used PHP-specific tools to parallelize workloads across machines, but as the data ingest grew, the infrastructure needed an upgrade to handle the growth.
“From a business need, we had to find a way to reliably deliver a lot more data, with high performance. We were interested in technologies we could add to our stack to help us with concurrency, and provide solid abstractions where developers didn’t have to worry about the intricacies of multi-threaded code.”
- Dustin Lyons, engineering manager at Credit Karma
The company’s first steps down the path to Reactive started with Scala and Futures, and using thresholds on a single machine.
Other use cases that have evolved out of Credit Karma’s early exploration with Akka for Fast Data, and moving from batch to streaming, have similarly been driven by the need for Reactive as defined by the principles in the Reactive Manifesto.
The company started simply, with Scala, Futures and a single machine with thresholds. Credit Karma then found Akka through research, and discovered that Scala and Akka together provided a toolkit for engineers to build data pipelines to scale across multiple machines, and handle the rate of data much better.
Today Credit Karma is using Scala, Akka and Akka Streams across key Fast Data pipelining scenarios:
Spark and Akka for Machine Learning Models - Credit Karma’s data science team has built sophisticated machine learning models for matching member attributes and needs to relevant partner offers. Spark trains the models, then leverages Akka to run the models against member profiles to generate real-time results in sessions.
Akka Streams for Reliably Handling Massive Batch Jobs - Credit Karma has the ongoing requirement to push terabytes of data from browsers and phones into its datacenters where it can be processed. Akka Streams is a key technology that enables that data flow reliably, without data volumes overwhelming target data stores.
Kafka and Akka for Instrumenting User Data - To optimize the member experience and continuously improve its service, Credit Karma uses Kafka and Akka to monitor user activities in real-time - measuring for engagement patterns.
Akka and Actors for the General Progression from Batch to Streaming - As Credit Karma generally seeks to move to more streaming architectures, it values the reliability and resiliency of Akka and the Actor Model for supervision strategies. The engineering team appreciates the ability to control how Actors manage failure, and the ease that Akka provides in scaling across distributed environments.
As Credit Karma’s massive data throughput requirements and advanced use of machine learning and real-time analytics pushes the envelope, they have recently shared some of the more interesting takeaways from their experiences.
In his recent blog post (Solving for High Throughput with Akka Streams), Credit Karma engineer Zack Loebel-Begelman traces the early introduction of Akka and the Actor model to Credit Karma's data architecture, to help it scale its analytics and reporting. The post describes the eventual problems of "back pressure" that the company faced as it processed tens of millions of events, and ultimately how Akka Streams brought resilience and predictability to its massive data throughput requirements for matching predictive modeling between members and financial offers.
In another recent blog post by the Credit Karma engineering team (Understanding Akka’s Quarantine State), Scott Livingston shares a firsthand account of how his team has managed a multi-node Akka cluster over time. The post chronicles the importance of building systems that can “fail gracefully” - and highlights one of the strengths in Akka to quarantine itself when it becomes unreachable.
In 2015 JPMorgan Chase CEO Jamie Dimon warned shareholders, “Silicon Valley is coming.” Hundreds of startups with “a lot of brains and money” are working on creating alternatives to traditional banks, he said.
Credit Karma is a profound example of how Agile development teams with strong visions (empowering consumers to make financial progress, in Credit Karma’s case) can catch competitors sleeping and overtake an industry by night. At Lightbend we’re seeing Fast Data as one of the main enablers within the overall Reactive trend that’s allowing teams like Credit Karma to seize the day with a data infrastructure that can move as fast as their business.
If you’d like to learn more, read the recent Lightbend report in which 2,100 Java and Scala developers share their insights on how Fast Data is driving their modernization efforts. Or read more case studies about how other companies have disrupted industries on top of Lightbend Reactive Platform technologies.