Primetalk’s Speech Portal is a sophisticated dialog manager that can speak freely with human beings. The Speech Portal’s initial version proved itself to be a very valuable tool for customers, however it did not scale well and was rather fragile. The scale and fragility issues were due in part to Speech Portal’s basis in Java and reliance on Java concurrency. In order to meet Primetalk’s business goals of branching out into the massive telephony handling market, Primetalk turned to the Lightbend Platform for the solutions to the challenges they faced.
Spoken Dialog Systems are among the most promising technologies for human-computer interaction. The current approach to creating such systems is to use VoiceXML applications that leverage existing technological stacks and engineering knowledge. The VoiceXML standard, however leads to the creation of rigid system-driven dialogs that are a far cry from the real flexible human-to-human dialogs that people desire.
Primetalk Speech Portal is a new platform for building spoken dialog systems that delivers flexible, interactive, user-initiative dialogs as a human-computer interface for diverse information systems. The platform includes a complete set of components for speech processing plus an artificial intelligence layer that gives the dialog system the ability to think for itself and learn.
Even though Primetalk is a new venture in the natural language processing field, the highly qualified team behind it has substantial experience in software development and scientific research:
- Two years of Scala experience
- More than ten years of experience in Enterprise Java development and project management
- Five years of scientific research in the field of spoken dialog systems
The teams mission is to replace the awkward interactive voice response systems (IVR) that are commonplace today, with user-friendly spoken dialog systems (SDS).
The goal of Primetalk is to create a completely new stack that allows the creation of flexible user-initiative dialogs. There are a number of circumstances that make this goal a challenging one, however:
Natural Language Processing (NLP) is an incredibly complex field. Natural language is undoubtedly the most sophisticated communication “protocol” there is. Furthermore, NLP is an interdisciplinary field, which requires understanding of artificial intelligence, neurophysiology, aural physiology, signal processing, computational linguistics, probability theory and mathematical statistics,.
Automatic, large vocabulary continuous speech recognition is a heavy load task, which would gain a significant benefit from parallelism. However, to make it actually run in parallel is a non-trivial exercise.
The dynamic event-driven nature of the real-time dialog handling: during dialog, a lot of events appear that should be handled immediately by the dialog logic. Among typical sources of events are recognition results, audible user interventions, timers, telephony, and system playback. Possible reactions include a great deal of intermediate steps that are terminated with speech synthesis, audible signal playback, an expectation change, or a dialog progress step.
- Multiple Audio Sources
Simultaneous playback of different audible media: synthesized speech, comfort noise, working noise, background music, audio soundlets and prerecorded media streams. It should be possible to quickly change the audio picture in response to some dialog events.
- Multiple Conversations
Parallel conversation with a few participants (“multilog”). In advanced applications, such as teleconferencing, ) it may be necessary to keep a few conversations in parallel.
Real-world demand for handling many channels by a single application instance is a much-desired feature. It should require zero administration to handle more independent conversations. Channel management includes prefetching the necessary number of channels and disposing them after dialog finish.
As previously mentioned, the first version of the Primetalk Speech Portal was developed with Java concurrency for parallelism and utilized the Guice dependency injection library for integration. Primetalk found quite a few drawbacks with this solution:
- Memory leaks are difficult to localize. When channels are associated with threads, and resources are allocated per channel, it is difficult to completely recycle the channel
- The communication between threads is error prone, because shared mutable state requires complex locking
- Scalability was limited because every additional channel added more threads to the thread-pool
- Guice requires instrumentation (annotation) of all components and quite a few additional compilation units (configuration modules). Sometimes it couldn't resolve circular dependencies and it was very difficult to find the proper point to break the loop
- Often Guice yields unexpected runtime errors when the configuration is incomplete
The drawbacks of the Java platform concurrency toolkit made it impossible to address in reasonable time the requirements of the new speech-processing stack. Clearly a new approach to solving the problem was needed.
Primetalk looked around for a better solution and found one in Lightbend’s Akka Concurrency Framework. Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. Primetalk implemented a pilot project with Akka in Java and found it to look quite promising. They had become intrigued with the Scala programming language during the Akka pilot and subsequently evaluated that too. Scala is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. The team then went on to implement another pilot with Akka in Scala; the results were so good that Primtalk knew they had found the basis of their system.
The transition to Scala was enabled by its seamless integration with Java. New Scala modules could be used immediately in the current version of Speech Portal. All subsequent modules since then have been developed with Scala.Arseniy ZhizhelevCEO, Primetalk
Scala and Akka gave Primetalk an indispensable opportunity to rapidly create robust and scalable applications. There are a number of factors that make Scala and Akka a better choice for Primtalk:
- Separation between state and thread pool in Akka eliminates memory leaks
- Immutable data processing excludes locking completely
- Functional Programming encourages the creation of reusable components. When some task is implemented as a pure function it becomes very easy to reuse it
- Scalability is perfect; adding more channels does not influence the thread-pool size
- The Cake Pattern gives early detection of configuration problems. When the dependencies are misconfigured we get compilation errors. Runtime errors are almost impossible
- The Cake Pattern is very easy to use, no need for instrumentation or “modules”
As part of the work that Primetalk has done, they have open sourced a framework named “SynapseGrid”, a function integration framework that can connect ordinary Scala functions into a system by declaring the directed graph of interconnections. It can also interconnect actors the same way as it interconnects functions. All systems and actors can be nested to create a supersystem at any level. SynapseGrid demonstrates some significant functionality:
- SynapseGrid allows function composition far more flexibly than monads and iteratees
- Strictly typed message handling in Akka actors (a bit more natural than in Typed actors)
- Multiple input/multiple output functions can easily be implemented
- Systems process portions of information as soon as possible
- Declarative composition in the form of DataFlow diagram
- Gives actors opportunity to configure the consumer of their output from the outside
SynapseGrid is probably one of the finest grained frameworks out there. The building block is as simple as a function!Arseniy ZhizhelevCEO, Primetalk
With the solid foundation of Scala, Akka and SynapseGrid, the newest release of Speech Portal has many desirable new properties:
- Elastic-scale on multichannel requirements
- Flexible integration of natural speech processing components
- Parallelism of automatic speech recognition
- Dialog manager for dynamic real-time dialogs
- Real-time mixer of many media sources that are switched in real time
- Stateless functional processing leads to almost zero debugging
- Painless concurrency
- Resilience to exceptions both in the Speech portal and in custom dialog logic
The transition to the Lightbend Platform has clearly given Primetalk the strong foundation they were looking for in building a new generation, scalable and robust dialog management platform.
Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.