Support
spark spark-streaming akka

Four Things to Know about Reliable Spark Streaming with Typesafe and Databricks

Oliver White Director of Developer Marketing

View on Slideshare

Last week, we were happy to have a Typesafe co-webinar with Databricks, the company founded by the creators of Apache Spark. Our Big Data Architect Dean Wampler and Datatbrick's Lead Engineer for Spark Streaming, Tathagata Das (TD) provided a 1-hour presentation with Q/A on Spark Streaming, which makes it easy to build scalable fault-tolerant streaming applications with Apache Spark. In this webinar, we reviewed:

  • The Stream Processing Landscape: Stream Processors (e.g. Akka, Spark), Stream Storage (e.g. Cassandra, Riak) and Stream Sources (e.g. Kafka, Amazon S3)
  • How Spark Streaming works in a quick review of the technology
  • Features in Spark Streaming that help prevent potential data loss with executors and drivers
  • Complementary tools in a streaming pipeline, examples using Akka and Apache Kafka
  • Design and tuning tips for Reactive Spark Streaming applications

Watch the full 60-min video with Q/A

Watch on YouTube


What attendees wanted to know about Spark and Spark Streaming

Some of the questions attendees had (many of which were answered in the recording above) include:

  1. How do you compare Spark Streaming vs Akka Streams vs Java 8 Streams?
  2. Explain the differences between Akka, Samza, Spark, Storm
  3. How suitable Spark streaming for real-time transactional service such as stock trading ? Would Kafka + Storm be more suitable in terms of higher throughput ?
  4. What about the latency implications of executor failures? Will recovery occur immediately without delaying the results?
  5. Is there any security policy for executors?
  6. Any benefits to using something else like MySQL as opposed to HDFS for checkpointing?
  7. Are there any plans to implement dynamic creation of Dstreams, for example resulting from transformation of previous Dstreams, after the streaming context has already started running?
  8. What is the mean ingestion rate of Spark Streaming?

What to know about Typesafe, Databricks and Mesosphere

Recently, we Typesafe expanded our existing Databricks partnership–which offers Spark development support–to include commercial support for enterprises deploying Spark on Apache Mesos/Mesosphere DCOS. We're happy to be the certified support partners of both Databricks and now Mesosphere, and if you're looking to build and deploy commercial applications using Apache Spark standalone, or on Apache Mesos and/or Mesosphere DCOS, now you know who to call! 

You can free to email me personally at oliver(@)typesafe.com, or ask for a Typesafe representative to get in touch here:

ASK TO BE CONTACTED

Share