In 2013, a group of cloud engineers joined Samsung to pursue a vision that the consumer Internet of Things (IoT) required a new data platform.
The explosion of devices that produce data is a well-documented trend, with Gartner predicting that 25 billion connected “things” will be in use by 2020. But for Jerome Dubreuil, Dan Serfaty and a number of other technologists deep inside of Samsung’s IoT efforts, the challenge at the data layer was as interesting as the explosion at the hardware layer.
Internet of Things is really all about the data. Connecting devices is great, but the explosion of data is what will profoundly change consumers’ lives, and we see this proliferation of data as a ‘next big thing’ at the order of magnitude that Internet and Mobile changed everyday lives.Jerome DubreuilSenior Director Engineering, SAMI Platform at Samsung Strategy and Innovation Center
Dubreuil and his colleagues at Samsung saw a number of technical obstacles holding IoT developers back. For starters, developers creating IoT solutions are burdened to create complex back-end systems for receiving and processing device data. In addition to the time required to build the full stack from device to cloud to services, typically the end result is a device with its own proprietary cloud interfaces and siloed data -- where the IoT solution has isolated itself from opportunities to correlate and fuse data with other IoT devices.
We believe that the future of IoT is about data fusion,” said Dubreuil. “We see a future where consumers’ lives are affected by hundreds of connected devices that they can control and harness the data from. If they all operate without any interaction, not only is that unsustainable from a consumer usability point of view - but we miss the potential of correlating this universe of data that makes IoT meaningful to consumers.
A simple example is the smart home. Imagine all the correlation that can be made around your life and wellness, and patterns that can be derived from appliances, energy usage, ambient data on things like air quality and noise disruptions. There is really no limit to what the data will allow us to achieve in this IoT opportunity. There are companies with deep expertise everywhere on the planet that are ready to build such services. But for that, they need access to the data, in a way that is safe and secure for the user, as well as reliable and easy to work with from a developer’s standpoint.Jerome DubreuilSenior Director Engineering, SAMI Platform at Samsung Strategy and Innovation Center
The Samsung SAMI team saw an opportunity to create an open ecosystem for IoT developers, aimed at gathering and making data more accessible, and liberating IoT developers from building the underlying plumbing for the systems. They set out to create a data broker in the cloud, that could handle any data, from any device or application. They saw SAMI functioning as a private data bank, with full security and privacy controls for developers. And they set out to create SAMI as a developer oriented platform, with simple and powerful APIs and tools to accelerate developers’ road to build and enable new IoT user experiences, services and business models.
When developers think of hundred-billion dollar annual revenue multinationals, they typically do not think of extremely fast and efficient release cycles, at startup speeds. But in just two years, the SAMI team hired top talents from the Silicon Valley and grew from zero to 40 people - including multiple developer teams, devops personnel, security specialists and QA engineers - and launched an IoT data platform that boasts a very ambitious scope of features.
At the highest level, SAMI provides an abstraction layer for ingesting device data, processing, routing and accessing it through very developer-friendly APIs. SAMI deals with everything from data security, to real-time data, to transformations, aggregations and storage. By developing on top of a platform that allows easy and secure access to any kind of data, from any kind of device, and interconnects with other device platforms as well - IoT developers using SAMI can focus on the building their added value to the ecosystem.
Under the hood, SAMI enlists a broad range of today’s hottest programming languages, build tools, distributed frameworks and technologies (Java, Scala, Play, Akka, Kafka, Zookeeper, Cassandra, ElasticSearch, MongoDB, TitanDB, Redis, MySQL, Spark, Samza, Mesos, Docker, Consul, etc.) to provide a number of advanced features and functionality:
The data coming into SAMI can be whatever the device or developer or application decides to send. In the middle there is a normalization process that will take that data and produce out a standard JSON format available through the API.
SAMI is developer friendly, with an emphasis on elegant APIs and SDKs. One key differentiator of SAMI is that there is no limitation as to what kind of data can be sent: developers define data fields, their type and unit, with no restriction. This data can then be sent in any format, including binary, JSON, XML, text, or any custom format for which the developer provides custom normalization code.
One of the other goals for SAMI is for developers to describe the data they plan to put through the platform. That’s done through what’s referred to as the “Manifest.” Other developers can then understand the data that is being captured by a certain device and write applications or algorithms against it.
Once data reaches the platform, it's easy to get it back in real-time through Firehose APIs that can be customized based on needs. Similarly, older data is immediately available through historical APIs that support time-based retrieval as well as search queries. Aggregates that are computed automatically as data are ingested are also available through APIs. These combined APIs are the backbone of applications implemented on top of SAMI, and abstract the complexity of where the data is coming from and how it was sent.
Just two years from its initial ideation, SAMI is delivering on its promise to abstract data from the physical hardware, allowing developers to concentrate their efforts on their devices and software, rather than the movement and normalization of data.
The typical arc that we see and hear about from IoT developers at meetups is that they had a great idea, they worked for a couple of months on a prototype, and then they had to build a platform to move the data between the client devices and the business. No one wants to have to build out the data pipelining or platforming, and SAMI is allowing IoT developers to bypass all those steps and just plug into something that works out of the box and supports any type of device.Dan SerfatySenior Staff Software Engineer at Samsung Electronics
The SAMI group made some key architecture decisions in the earliest days of building out this innovative ioT data platform. The team implements a Lean Engineering philosophy with a razor sharp focus on efficiency, automation, testing and always pushing to use or create cutting-edge technologies. Central to their efforts were the Scala and Play Framework for rapid prototyping and developer productivity, and Akka for the data transformation and WebSocket layers. Let’s take a look at some of those key decisions that allowed SAMI to ship such an ambitiously scoped platform in less than two years.
From day one, the SAMI team knew we would be building the platform following Reactive principles. A data system must isolate failures, and SAMI’s mission required resilience under any load at any scale. Key to the resilience of the SAMI platform is its design as a microservices-based platform with Akka. Using Akka for fine-grained components isolation and Docker for deployment, SAMI is built for concurrency, scalability and resilience on the JVM. The isolation properties that Akka provides help protect the system against individual failures, and also allows an efficient continuous deployment model. This approach frees the dev team to rapidly build out new features without having to maintain a monolith.
SAMI was written to run asynchronously and on the Java Virtual Machine (JVM). The team believed in the JVM’s reliability and design for concurrency, and felt that it provided major advantages for the type of massive streaming required of SAMI. With IoT devices, some send one data point per day, while others send thousands of data points per hour - and the JVM has helped SAMI support millions of events per day.
Approximately 60% of SAMI is written in Scala, with the rest written in Java. The SAMI team chose Scala because it believes the language is concise, elegant, and typesafe - and that it smoothly integrated object-oriented and functional programming. The SAMI team also appreciated Scala’s strengths in handling multiple concurrent users without waiting for a response. This “non-blocking” characteristic of Scala made it a natural fit for SAMI’s “always-on” requirements for handling any number of concurrent devices sending millions of device messages back to the platform.
Prior to Samsung, the SAMI team had previous experience with Play Framework, and liked it for fast and easy prototyping. Because Play is web-oriented, it allowed SAMI to use a single framework for both APIs and front-end development. And Play has allowed easy implementation of stateless, Reactive, asynchronous request handling. Play’s handling of WebSockets is very important to the SAMI platform, and its flexibility for developing in either Scala or Java gives the SAMI team maximum choice as it continues to build out new features.
Play is very important because it allows us to go fast. We are really a startup inside of Samsung where we want to go fast. With Play we know we can do fast prototypes, fast changes. It's also good for us because there is no application server required. SAMI's architecture emphasizes containerized workloads, and Play really supports that model beautifully.Jerome DubreuilSenior Director Engineering, SAMI Platform at Samsung Strategy and Innovation Center
Akka is built to be a run-time and middleware platform for doing concurrency and scalability on the JVM. SAMI uses Akka at two critical levels: (1) at the WebSocket handling level, where it uses Actors dedicated to clients for each WebSocket, each with a mailbox that’s processed and queued into Kafka; (2) at the transformation layer, where data is streamed into the system and normalized. The SAMI team also heavily leverages Apache Cassandra - the popular NoSQL database - which is distributed, asynchronous and fault tolerant by default, and a powerful pairing with Akka.
Play and Akka handling WebSockets is a really powerful combination for SAMI. If you did WebSockets with something else, you'd have to pick your dependencies and pick technologies. Whereas we can just use Play and we’re ready to go with WebSockets, and have Akka behind it to handle those WebSockets efficiently.Dan SerfatySenior Staff Software Engineer at Samsung Strategy and Innovation Center
SAMI is pushing the boundaries of IoT platforms and is the ideal choice for developers building solutions with connected devices. The platform is being used by a variety of developers, from small startups that decided to fully build their product on top of SAMI to accelerate their go-to-market time, to larger Samsung internal groups or large external companies that are testing new services based on SAMI, for the scale, security and robustness it brings to the table.
Today, many SAMI developers come up with brilliant ideas for new devices that digitize our physical world, and to capture existing or new signals that have never been available in digital form. The SAMI team also has seen many experts coming up with new services that compute and mashup that new data, and transform it into “small data” that they put in the hands of consumers, thereby providing them with insights and predictions, never seen before.
Earlier this summer, the SAMI team blogged about its embrace of containers and its highly available Mesos cluster that abstracts resources at the datacenter level. The SAMI team continues to aggressively build new platform functionality with Play, which fits very well with this model for stateless services, easy to deploy on a microservices architecture.
We can run Play as-is, it’s very simple to deploy, and we don’t have to install Tomcat or any server container. In the world of containers and resource allocation with Mesos, the fact that it’s a simple drop of a package that you can start right away without any complexity is a good point because it simplifies your operations for a container.Jerome DubreuilSenior Director Engineering, SAMI Platform at Samsung Strategy and Innovation Center
Today developer adoption of the SAMI platform continues to grow rapidly. Individual devices in the Smart Home category submit messages in the hundreds-per-day range, while clients in the Smart Health category submit up to millions of messages per day (from devices capturing ECG, for example). The Reactive characteristics of SAMI’s application infrastructure, enabled by Lightbend technologies at the core, have allowed the SAMI team to continue to execute its original vision for abstracting a data layer for IoT developers. By allowing SAMI to do the heavy lifting with large volumes of device data, IoT developers are freed to focus on the unique value they are creating for their users.
Read the article.
Inspired by this story? Contact us to learn more about what Lightbend can do for your organization.