Introducing Akka Cloud to Edge Continuum. Build once for the Cloud. Seamlessly deploy to the Edge - Read Blog
Support

How Dwango Scaled Live Streaming With Play And Akka

Written by Soichiro Yoshimura
Translated by Eugene Yokota
Download Japanese Version PDF (1MB)

About Niconico

Niconico is among the most popular video sharing websites in Japan. At the time of its launch in 2006, it was called Niconico Douga (means smiling videos). The most distinctive feature is its commenting, which allows the users to post comments along the time axis of the broadcasted videos. The users are also able post comments on non-video contents such as live-streamed events and electronic books.''

To get the feel of the scale of Niconico's service, as of August 2015, it has 50 million registered users, and 2.5 million premium users. In terms of the site's access, as of 2015 March the site averaged 178.62 million page views per day, and 8.80 million unique visitors per month.

Dwango Co, Ltd that develops Niconico also develops over ten products and subsystems in Scala, but in this post we will focus on Niconico Live, the live streaming service, and in particular the large-scale broadcasting system within it.

Previous problems with Niconico Live

There were two major problems with Niconico Live:

  • Performance issues with large-scale broadcasting system.
  • Code base with high cost of maintenance.

We'll examine these problems further.

Performance issues with large-scale broadcasting system

The performance problems around large-scale broadcasting split further into two areas:

  • High CPU load caused by to the characteristics of the services.
  • Structurally unscalable datastore IO.

Among the factors identified to be causing high CPU load was the strict program starting time and the Viewership Admin system, which prioritized the premium users over the regular users. By streaming 30 minutes ahead of the actual starting time, we could lighten the load at the program start time to some extent, but the spikes were unavoidable.

Also, the Viewership Admin system worked by hundreds of thousands of viewers polling the system to check whether they have the viewing rights, and it too was contributing to the high CPU load.

Lastly, the structurally unscalable datastore IO problem was caused by the implementation of Viewership Admin system that issued complicated SQL queries against the RDB for each viewership confirmation and updates. We knew that Viewership Admin system's dependencies to disk IO was preventing it from further scale-up or scale-out.

Code base with high cost of maintenance

The other huge problem was the maintainability of the code base. Even after series of improvements, we have over 1 million lines of PHP code to this date that is littered with enormous implicit branches created by copying and pasting.

Modification to the code became extremely expensive as it was difficult to investigate and verify the range of effect from each change. Even taking a single method, some of them have the cyclomatic complexity exceeding 600 and lacking unit tests. In addition, the basic data in code was expressed in terms of PHP arrays and associative arrays instead of using classes and types. This increased the cost of reading and required large amount of knowledge to comprehend the code.

Not focusing on the technical debt of the codebase as the team eventually lead to fleeing developers. The organization was left with the lack of the domain knowledge and the codebase with high reading cost.

The goal of Niconico Live New Player (Relive Project)

To resolve the problems described above, we started a new project for Niconico Live New Player, or the Relive Project. Here were the original goals:

  1. Withstand large-scale broadcast.
  2. Do not inherit the existing mechanism.
  3. Flexible and stable viewership administration.
  4. Extensibility to add new features.

Strategies for development

To achieve the goals, our basic development strategies were:

  1. Small releases.
  2. Release from the features with the highest return of investment.

As the first milestone for the small releases, we started with the Niconico Live Flash player, which is displayed on the top page of Niconico. After four months of development, in October of 2013, we performed load tests in production environment and broadcasted of a program.

Even at the time, Niconico Live had been accumulating many features in the six years since the initial release in 2007. So we gave up on finishing all of the features in a single release, and instead focused on broadcasting large-scale live events, and made incremental releases towards it.

Initially, it did not have enough feature to be used for the actual program broadcasting, so we took the approach of creating a showcase environment for the internal use. When we added new features to this showcase environment, we announced them to all of the stakeholders in the company.

System layout

We used the latest versions from the following libraries:

  • Play with Scala
  • Akka Cluster
  • Slick

The reason we adopted Scala, Akka, and Play was because the following architectural priorities:

  • Scalability
  • Stability and fault tolerance
  • Extensibility
  • Performance

Scala, Akka, and Play had the best conceptual fit with our priorities especially in terms of the scalability (for both scaling up and scaling out). In particular, we planned on resolving the IO performance issue of the Viewership Admin system by combining a distributed high-speed data storage with the parallel processing technology of Akka.

In addition, we planned that high CPU cost from the polling can be lowered using Play's WebSocket feature, and implement features with higher interactivity.

From the operations side, Play and Scala had a successful track record with Niconico's smartphone API server, so we knew we can operate them without problems.

The data store consists of:

  • MySQL/MHA, and
  • Redis/Twemproxy/Sentinel

Both MySQL and Redis were introduced to construct a redundant system. The MySQL database is made redundant on MHA with Master-Slave replication, and is partitioned both horizontally and vertically.

Redis is clustered using Twemproxy, made redundant with Sentinel. Additionally two separate clusters are formed for Redis for small data under 100 bytes and the others. This is due to the fact that Redis is implemented using a single thread, so we need to prevent the overall lowering of RPS caused by blocking on some large data.

Module layout

For the module layout, we adopted the layered architecture, which we had the most track record on. As illustrated in the following, the domain part forms common JARs, and it's surrounded by Play applications playing different roles.

The above diagram is simplified to be just six Play applications (web, webapi, websocketapi, administration, batch, domination) and a JAR for the domain model, but they are further broken apart in reality. Each applications communicate with each other using Akka Cluster or the data store.

The team layout is aligned with the layout of the modules. This is influenced by the solution to the Conway's Law introduced in Organizational Patterns of Agile Software Development by James O. Coplien. There he introduces the idea of thinking about the ideal module layout first, and then making the team layout to match it.

As the result we created the following teams:

  • Web frontend (responsible for the web module)
  • Admin tool (responsible for administration module)
  • Backend (responsible for webapi, websocketapi, batch, and domination modules)
  • New feature development team of the time

We've organized the development so each team consists of around four to six people.

Each application in the module layout is implemented as a standalone polymorphic project. At the end, the all six applications are built together at the build time, to form a single software project. This allowed us to develop as if we were developing a monolithic application.

There were also some subsystems with less dependencies on the business logic that were developed as a microservice. For instance, the system for the audience to give presents to live broadcasters, also on Play and Scala, was developed as a microservice communicating with other services over RESTful API. It was developed by completely separate team.

On the other hand, due to the network overhead and administration cost, we've also had instances of microservices folding into Niconico Live New Player project.

The size of the project

The code base grew to 160 thousand lines of code in two years between May, 2013 and May, 2015. The project started with a single team of four people in May of 2013, but by December of 2014 it became over 20 people consisting of four teams.

New members were normally assigned to Admin tool team. This is from Day Care pattern also from the Organizational Patterns book. The benefit is that the administration tools are easier to respond to problems, and it helps in training the new employees' communication skills since the service directors are the users.

Result

At the end of 2013, the platform for the large-scale broadcasting completed successfully. The original goal, which was to withstand large-scale broadcast, could be met without a hitch.

In terms of the CPU load, there are around 10x difference in the performance comparing our custom framework written in PHP with that of Play. We compared the rendering from Smarty2, which is used by our custom PHP framework against the Groovy template we use for Play.

The following is the result from comparing the performance of Viewership Admin system using Gatling:

The above graph shows that a single instance is handling 32667 simultaneous connections. There are actually two application instances running on a physical machine, so a physical machine is able to process viewership of around 65000 simultaneous connections, which is approximately 4x the previous application. We've achieve the original plan of using Websocket to reduce the CPU cost. Note that this result is currently bottlenecked by the file handler limit per process set by the operating system, so we think the performance could improve with further tuning.

Using this large-scale broadcasting system, we were able to stably broadcast 2013 New Year Eve programs, 2014 Japanese General Election, and most programs at Chokaigi, a physical event hosted by Niconico that took place in April of 2015. As of today (July, 2015) we are still adding more new features.

Analysis

Problems

There were two problems that we encountered during this project:

  • Learning cost
  • Compilation speed

For the engineers who were using PHP, the learning cost of Scala was high:

  • Data structures and algorithms
  • Parallel programming and the Actor Model
  • Unique thinking behind a Functional Programming language (actually learning Haskell)

We have organized programs to study the above, however, it requires serious organizational commitment to education, and highly skilled people to execute the education. Luckily Dwango had a few of such engineers, but we felt the lack of skill for the initial year of the project.

The slow compilation problem triggers subsequent operational issues:

  • Turnaround time for CI slows down, delaying the detection of problems.
  • Prevents fast releases.

The limited workarounds that we have so far are using fast machines, turning less-frequently changing parts into fixed JARs, and turning the parts that requires frequent change into data structure that does not require compilation. Thus far, a fundamental solution has not been found.

Merits

Without question there have been number of merits from introducing Play and Scala.

  • Performance backed by the JVM
  • Lack of memory leaks
  • Better expression of complicated business logic compared to PHP

For those coming from PHP and Ruby, the performance scale-up backed the JVM should not be underestimated. Although this was not our original intent, the reduction in the number of servers and being able to ensure stability against spikey access are both very valuable.

In addition, Scala's code style influenced by functional programming resulted to the elimination of memory leaks. By making Play application a pure processing server, and by offloading the states to the data stores, we were relieved of the operational anxiety caused by memory leaks that frequented Java applications of the past. This has lowered the operation-related stress tremendously.

Finally, on the point of expressing complicated business logics, we think it owes largely to the rich API such as the flatMap method around Scala collection. For instance, processing of a collection, interleaved with exceptional events, can be described concisely using flatMap. This greatly contributes to better readability and maintainability of the code.

Further project improvements

Looking further into the future of our project, one of the challenges is the gap in the descriptional devel in the code. This is due the fact that some people write Scala as a Better Java, while others try to use it as an alternative to Haskell. This is something we need to work on by enriching our education system.

There are many operational challenges remaining for Akka Cluster. Especially when it comes to a system that needs to stay operational for 24/7, we think we are still at the stage of collecting know-hows for zero-downtime system restart, monitoring of the ActorSystem itself, and creating a framework to guarantee communication of the business logics to Remote Actors as the support from the library alone cannot solve these issues.

Should you replace your project with Scala/Akka/Play?

With the above problems and merits, we've created a checklist to see what type of project would be suited for replacement.

  • Currently using a scripting language, and scaling up performance is in demand.
  • Currently using a scripting language, and scaling out performance is in demand.
  • Deals with complicated logic that changes daily.

If you are in such situation, it's worth replacing your system with Scala/Akka/Play. However, we think it's not necessary to force the replacement of the systems written in scripting languages if there aren't enough merit against the cost.

Summary

  • We were able to resolve the performance problem we originally had by using Scala, Akka, and Play, which have technology like Websocket and asynchronous IO built in. However, optimizing the performance characteristics required not just the framework, but the total perspective such as making the data store redundant.
  • Adopting an expressive programming language Scala allowed us to organizationally share the domain knowledge through the ubiquitous language and the types, which improved both the readability and maintainability of the code. However, adoption of the Scala came with high learning cost, and the necessity to continuously adjust the internal education system.

Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.

GET IN TOUCH

Talk to an Expert

Tell us what you’re building, and we’ll
tell you how we can help.

Contact Us