Unpredictable scaling demands dictated by a groundbreaking, internet-facing cloud provisioning platform required a fresh look at solutions for building massively concurrent applications. Abiquo turned to Lightbend’s Akka middleware to help solve the problem.
Abiquo is the leading enterprise Cloud management software company. With Abiquo, organizations can use business policy to manage an entire, globally deployed computing infrastructure, comprising of unlimited physical and Cloud resources including private, public and hybrid Clouds, through a single pane of glass. As a result, Abiquo customers are able to significantly decrease the cost and complexity of managing their virtual IT environments, while maintaining control of the physical infrastructure and increasing agility to change hypervisors as needed.
Abiquo’s solution gives users the ability to pick from any of the six market-leading hypervisors, and select custom storage solutions, network providers and other technologies in an almost building-block like fashion.
From a technical standpoint, 90% of the Abiquo platform is written in Java, with C++ components for integrating with some of the open-source hypervisor API’s like Xen, etc. It offers a rich, Adobe Flex based user interface.
It became clear about a year ago, that Abiquo’s massive configurability had turned out to be its Achilles heel. The system would hit peak throughput at a relatively low load. While it was meeting many of its design goals, it was clearly not meeting the key scalability requirements that had originally been dictated.
Albert Puig, a Senior Engineer led the effort to investigate the scalability problem and find an appropriate solution.
After reviewing the systems bottlenecks, it appeared that the central Abiquo Node was making synchronous calls to the Virtualization Factory. This was a critical issue as the Virtualization Factory handled the calls to systems and services that configured the end-users virtual environments. These calls were taking too long to complete, and subsequently created a bottleneck. Furthermore, the system was unable to provide any status updates to the end-users during the creation of the virtual environments. This left them in a state of confusion, not knowing if things were working correctly or not.
The obvious solution would be to make asynchronous calls that executed the work in parallel, however, multithreaded Java applications are notoriously difficult to write, and even when they are written well, Java still has a hard time scaling predictably and reliably.
Albert and his team turned to a message based solution that utilizes Rabbit/MQ in conjunction with Lightbend’s Akka middleware. Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM with an Actor based model that provides a higher level of abstraction than that which Java affords. Even though Akka is written in Scala (a highly scalable, general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way), Akka has both Scala and Java API’s so it worked seamlessly with Abiquo’s platform of choice.
Refactoring the system by bringing Akka and Rabbit/MQ together proved to be an easy task as Akka has integration with AMQP built right in, thereby allowing Akka’s actors to interact easily with Rabbit/MQ messages.
Ultimately, Abiquo’s use of Akka proved to be both simple and elegant, with one supervisor actor that provides custom routing services to several custom “hypervisor actors” that each service a specific type of hypervisor. Each hypervisor actor has a dynamic pool of actors that perform specific configuration operations concurrently. Since Abiquo has no control over the number of open sessions at any one point in time, Akka’s dynamic elasticity makes the system very easy to scale, almost magically.
This new architecture immediately proved to be a success, as the Virtualization Factory is now able to processes many requests in parallel and provides far richer feedback to users about the different subtasks that it is executing, such as:
- Connecting to Hypervisor
- Virtual Machine Creation
- Template Copying (very time consuming)
- Network Configuration
- DHCP Configuration
- Powering On Virtual Machine
Since this new approach to scaling out this component was so successful, Abiquo then decided to utilize Akka in other parts of their system too.
The Virtual System Monitor is the component that manages and monitors the Virtual Machine states (Destroyed, Moved, Power On/Off, Paused, etc). It does this through a polling mechanism, which queries the hypervisor directly for the statuses of Virtual Machines in its control. While Abiquo had not seen scaling issues with the Virtual System Monitor, looking at the lessons learned from their previous exercise allowed Enric Ruiz and his team to refactor the Virtual System Monitor in the same manner, thereby ensuring that it was both architected to scale and functionally consistent with the new Virtualization Factory too.
Abiquo has tested this implementation in their simulation environment and they can monitor many thousands of hypervisors concurrently in one data center instance. Since Abiquo runs in multiple data centers they can realistically monitor hundreds of thousands of Virtual Machines concurrently.
Both Abiquo and their customers are extremely satisfied with the platform as its usage continues to grow.
Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.