UniCredit Powers “Fast Data” Customer-Insight Platform With Apache Spark, Scala, Akka And Play
UniCredit powers “fast data” customer-insight platform with Apache Spark, Scala, Akka and Play
UniCredit is a leading European financial group with an international network spanning 50 markets. With circa 8,500 branches and 147,000 employees serving more than 32 million clients, the Group has commercial banking operations in 17 countries and assets of €900 billion. As one of the strongest banks in Europe, UniCredit has a Common Equity Tier 1 Capital ratio of 10.35 percent pro-forma (fully-loaded Basel III, including Pioneer). It also has the largest presence of banks in Central and Eastern Europe, with nearly 3,500 branches and assets of €165 billion.
Looking forward, UniCredit decided to proactively confront an impending challenge in 2012: how to continue to serve an aging customer base with their existing Java stack, while simultaneously collect valuable insight from their enormous amounts of historical data in order to continually evolve their modern web and mobile platform bank services?
In 2012, an executive decision was made to create a new team called the Group Research and Open Innovation department, tasked with researching innovations that would power UniCredit for the future. After reviewing all the requirements, the Group Research and Open Innovation team selected Apache Spark and Lightbend Reactive Platform technologies Akka, Play and Scala in order to create a distributed, resilient, fast data processing platform. Within two weeks, a prototype application was ready to test.
The reason to modernize
Facing an inability to easily access and rapidly analyze decades of historical data, UniCredit’s team started on their “fast data” project. The goal was clear: unlock the value in this massive quantity of data that they had never before been able to see in order to understand the needs of future customers.
One challenge standing in their way of getting a meaningful, expansive view of their business customers was a mix of legacy data repositories and storage. Some of UniCredit’s technologies in use are IBM DB2, Oracle DB, Teradata, Oracle Exadata, and even magnetic tape (as required by the Italian government). The challenge was to find a way to connect everything together meaningfully.
UniCredit was tasked with uncovering and graphing relationships between companies that are clients, looking for patterns or connections that would help them better provide services for interconnected customers.
Going Reactive with “fast data” to understand customer relationships and interactions
The UniCredit team started off by implementing Cloudera’s Hadoop distribution and HBase, namely as a way to bring these enormous quantities of disparate data into one place. But when it came down to bringing this data into motion and make it valuable through algorithmic, graphical data analysis, this solution was insufficient. What was needed was a highly-performant data pipeline that would be resource efficient and resilient, and also fun to work with. The main characteristics were:
- Highly Performant—Before the introduction and headlines around Spark, the team had started using Hadoop MapReduce and Scalding, a Scala library for specifying Hadoop jobs built by Twitter. When Spark was introduced, the team quickly moved to integrate it with Hadoop to get much greater performance and work with the data more easily. In addition to the new functionality needed for their fast data system, UniCredit now uses Spark for their existing Hadoop jobs, such as crunching data from various legacy systems and graphing page rankings.
- Mission-critical Level Resilience—Even though this system is not consumer-facing, it is nonetheless considered mission critical and resilience is a huge factor. For UniCredit’s Corporate Relationship Management department, this system is a veritable “Swiss Army knife” that consolidates dozens of different tools previously used, and without it employees cannot harness the value that this data reveals.
- Capable of distributed computing—Use of Akka Clusters proved to be an easy and effective way to deal with distributed data computation for complex processing pipelines, where a “microservice-style” approach to computation is efficiently crunched by clusters of Akka actors.
- Fun to work with, fast to prototype—An important factor that is difficult to calculate is developer happiness. Scala is an incredibly concise language that reduces boilerplate, and Play Framework’s console and instant code update features made the prototyping process efficient and lean. UniCredit was able to create a prototype in a matter of weeks to share with management.
- Capable of growing to handle streaming—With enormous amounts of data to be processed, the potential is there to eventually incorporate data streaming technologies with built-in back pressure like Akka Streams and, soon, Spark Streaming.
With these requirements in mind, the team began looking into new technologies for building a prototype of this system. With some existing experience in Scala, Apache Spark (written in Scala) was chosen as the data computation component, and after seeing the performance and features of Apache Spark working with Akka, the decision was made to go for Lightbend Reactive Platform.
The results: a Fast Data platform that delivers new insights into client relationships
UniCredit’s platform is based on distributed Akka clusters to maintain their application’s resilience and elasticity. Written in Scala, their “Fast Data” project heavily utilizes Akka, including Akka Persistence and Akka HTTP (formerly Spray) to support distribution and for collecting/sending data from difference sources. Play Framework is used for quick prototyping and RESTful API/HTTP communication. For added performance, the application places Spark alongside Cloudera’s Hadoop distribution, in addition to HBase, CouchDB, and Aerospike.
After presenting the prototype, it was tested in production for a few weeks before being declared production-ready and launched into UniCredit’s infrastructure. Soon, the insights from this project became so valuable that UniCredit decided to build a new "intelligent CRM” that other departments could integrate and utilize for large-scale analysis.
- Revealing new insights never before seen—by selecting a new “Reactive Stack”, based on Spark, Akka, Play and Scala, UniCredit was able to access and analyze data sets that previously were never connected, allowing them to utilize decades of information and develop new services for interconnected corporate clients.
- Seeing the value—UniCredit was able to uncover relationships between their corporate customers in the first several weeks, enabling them to understand and generate more personalized services that weren’t possible before.
- Pay-it-forward—with the proven success of this project, UniCredit plans to use these technologies in more systems across their enterprise.
- Handles growing needs—with these technologies, future additions of streaming technologies like Akka Streams and Spark Streaming are not only possible, but simple.
Convinced of the power of Spark, Scala and Akka, UniCredit has another prototype in the works, utilizing even more of the so-called “Reactive Stack” technologies by combining Scala, Akka and Spark with Apache Kafka. In fact, a new experiment using Akka Streams and Spark Streaming for Natural Language Processing (NLP) has begun in order to analyze different types of content on the web.
Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.