- Retain market leadership with personalized, time-sensitive offerings to a user base that includes more than 48 million active customers.
- With evidence showing that delays in outreach impact conversions, Groupon must ensure campaign delivery of millions of daily emails and push notifications to users in 500+ markets across 15 countries.
- Maintain a resilient and responsive user experience, especially when scaling out to handle large peaks in demand–2-3 hours of 7-10x increased traffic is common–during the holidays and special events (i.e. Black Friday, Cyber Monday, Christmas, Super Bowl).
- Previous monolithic application architecture was painful and slow to update, difficult and expensive to scale in light of increasing demand, and unable to handle concurrency and parallelism without running into programming errors and vulnerabilities.
- Replaced legacy monolith with a Reactive microservices architecture based on Akka and Play Framework.
- Ensured predictable and timely personalized engagement with millions of customers at just the right time by integrating various data sources.
- Reactive microservices architecture increased developer productivity for daily deployments (compared to every few weeks in the past).
- Increased throughput across multiple VMs deployed on very few physical hosts serving more than 600,000 requests per minute.
- Big data enrichment workflow decreased from 30 to 10 minutes–using 25% less compute resources–enabling faster engagement with millions of customers at just the right time.
- Stable service delivery capable of scaling up for the 7-10X spikes in demand during special events., without facing any downtime in over 3 years.
Scaling time-sensitive, personalized campaigns to a massive customer base
48 million customers in 15 countries, with millions of communications and hundreds of GBs of data per day
No downtime in 3 years after suffering system failure during Black Friday in the past
7-10x peaks in demand for several hours during holidays and special events
BIG DATA AT SPEED
Akka reduced data upload time from 30min to 10min for time-sensitive campaign delivery pulling from Hadoop, RabbitMQ, Cassandra DB
Reduced compute resources by 25% while optimizing performance to handle 600,000 requests/min on a very few physical servers
Some Background: Groupon
Groupon (NASDAQ: GRPN) is building the daily habit in local commerce, offering a vast mobile and online marketplace where people discover and save on amazing things to do, see, eat and buy. By enabling real-time commerce across local businesses, travel destinations, consumer products and live events, shoppers can find the best a city has to offer.
With $2.6B in revenues in 2018, Groupon helps shoppers find the best-personalized experiences a city has to offer by enabling real-time commerce across local businesses, travel destinations, consumer products, and live events. Groupon is redefining how small businesses attract and retain customers by providing them with customizable and scalable marketing tools and services to profitably grow their businesses.
With a complex network of over a million merchants distributed across multiple geographies, the Groupon platform (Groupon and LivingSocial) connects a very elaborate and complex engineering infrastructure at the top of the business funnel with merchants serving over 48 million prospective customers daily across 500+ markets in 15 countries.
We talked with Aditya Athalye, Senior Software Engineer at Groupon, about the company’s digital transformation from legacy monolith to Reactive microservices with Akka and Play Framework by Lightbend.
The Challenge: A Brittle, Unscalable Monolithic Architecture
Reaching out to millions of people each day at the right time–while meeting the stringent SLAs–is critical to the functioning of Groupon’s business. All user outreach is made possible by a set of delivery services which deliver millions of emails and push notifications for the app on a daily basis. These numbers go up even further during the Holiday Season (Black Friday Sale, Cyber Monday, Christmas sale) or special events like Super Bowl.
Minor delays in emails reaching out to end users can result in substantial revenue loss to the company. For example, an email promoting after-work activities targeting 1m Japanese users should be sent between 7-8am, when most are on their way to work and planning the evening’s activities. If it gets sent out at 9am, it’s too late to be effective.
Like most e-commerce companies that started out in the previous decade, Groupon’s platform was largely monolithic in nature. Most of the legacy services were built with Ruby On Rails, though the marketing campaign systems had been overhauled and were largely built in Java. This presented several challenges to Groupon’s growth, such as:
- Slow to update: Any new feature change, even adding a new email type or template, meant changing the monolith with expensive testing, and deployment cycles.
- Poor scalability: The monolith was built on Ruby On Rails which prevented it from scaling to the ever-increasing traffic without sophisticated HTTP caching layers.
- Prone to errors: Even after moving to Java 8 and more modern day technologies and tools like Postgres/MySQL, Elasticsearch, and Apache Kafka, it was difficult to balance concurrency and parallelism across the system as a whole, meaning that programming errors become more common.
Groupon set out to solve their scalability and extensibility challenges quickly, and elected to break down their legacy monolithic platform into lighter, granular, Reactive microservices. Groupon achieved this by adopting Akka and Play Framework by Lightbend.
The Solution: Reactive Microservices With Akka And Play Framework
Groupon continues to strategically invest in augmenting its engineering capabilities as well as putting in industry best practices to ensure it has top of the line systems to enable a consistently responsive and available online business.
As Lightbend is the leading contributor to the Reactive Manifesto, they have ensured that their tools are always up to the mark and aligned to the evolving needs of modern businesses.
To break down their monolith, a vigorous technology selection process was carried out to ensure that any new technology choices were popular and battle-tested, as well as well-documented, supported by experts, and designed from the core to serve the needs of Reactive microservices. In the end, Akka and Play Framework by Lightbend were selected based on their ability to keep Groupon’s microservices and other critical systems functioning and achieve its business objectives, including:
- Maintaining resilience against failures and handle millions of concurrent requests in a stateless manner while meeting SLAs demanded by the business across geographically distributed data centers.
- Usage of underlying Reactive design concepts, like fully non-blocking/asynchronous Application Layer protocol support (HTTP/HTTP2), which allows Groupon’s systems to scale much better with greater uptime–resulting in no critical service failures in years.
- Integration with other state-of-the-art technologies in databases, NoSQL stores, and messaging systems, allowing Groupon engineers to meet the constant demand of business to incorporate new features, like personalized, timely offerings.
- Reduced time to market through improved developer productivity and operational effectiveness delivered by the reliability and stability of these platforms.
The Impact: Predictable Revenue, Scalability, And No Downtime
Companies like Groupon operating an internet business need to be able to handle peak loads during holiday seasons or flash sales, and the Reactive technologies provided by Lightbend go a long way in accomplishing that.
Lightbend technologies have made a significant impact on the daily business of Groupon, enabling a previously unimaginable level of scale for delivering personalized, targeted customer offerings at the right time–especially during peak demand.
- Meeting revenue targets: Groupon is able to continually meet both business and customer SLAs by reaching the intended recipient at the right time. With frequent peaks in demand for sending timely emails/notifications that are so critical to the business (every marketing campaign has revenue SLAs), Groupon is now able to regularly handle over 600,000 requests per minute on less infrastructure than previously needed.
- Elastic scalability to meet peak demand: In the past, scaling to meet expected demand during the holiday season used to be stressful and unpredictable. With Groupon’s new modular, scalable Reactive microservices architecture, all mission-critical marketing promotions can easily be served for the two to three hours of 7-10X peaks in traffic on days like Black Friday and the Super Bowl.
- Eliminating Expensive Downtime: As a 24/7 e-commerce marketplace, eliminating downtime is critical. Just a few minutes of unavailability can affect business performance, especially on key days like Black Friday, Cyber Monday etc. Groupon’s Marketing engineering team has consistently achieved this goal with Lightbend technologies over the last few years.