From real-time data streaming expert and VP of Fast Data Engineering at Lightbend, Dean Wampler, Ph.D., comes the 2nd Edition of his popular O'Reilly eBook Fast Data Architectures for Streaming Applications. Get the latest updates on how to design, build, deploy, and manage Fast Data applications for handling real-time streaming datasets that never end!
The emergence of the Internet created datasets of unprecedented size, leading to new tools and techniques that now constitute the Big Data ecosystem. Until recently, Big Data has been oriented towards the capture of data and batch-mode analysis of it. In the last several years, compressing the time between data arrival and information extraction has become a competitive advantage, leading to the emergence of stream-oriented architectures for Big Data, or so-called Fast Data systems.
Streaming tools are evolving at a furious pace, making it difficult to keep up and to know what to trust. The word “streaming” is heavily overloaded, too, but the essence for our purposes is infinite data sets. In other words, how would you process data that never ends? Classic batch-mode processing is the special case of a data “stream” that’s actually finite.
Independent of particular tools, some common principles underlay processing of infinite data sets, including data as event logs, message queues as integration points, sophisticated semantics for stream-based analytics, and the need to integrate up-to-date results with the rest of your applications. Implementations must be scalable, resilient, and responsible, hallmarks of Reactive systems. Understanding these principles and how they compose to build Fast Data architectures allow us to work with the tools available today and to evolve our systems as our tools evolve.