Spark learning materials

Here are the essential materials that I’ve found to facilitate my learning of Spark:

1. Learning Spark

Research papers


I’ve always thought that learning by reading books is one of the most effective ways thanks to their well-structured content, the ease of getting an overview and follow the steps. As Spark is still a young and promising project (first released is October 15, 2012), until this moment (11-2014), there are only 2 available books:

  • Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, O’Reilly - highly recommended
  • Fast Data Processing with Spark - not recommended


  • Advanced Analytics with Spark - recommended
  • Spark in Action



2. Spark internals for developers

If you want to read the Spark’s source code, understand the system or tweak Spark’s internals:

  • Spark’s Wiki: to understand how the contribution works and get the developer tools
  • AMP Camp 4 lab: you will find a gentle presentation on Spark’s ecosystem and hands-on exercises to understand the features that Spark is supporting.
  • Programming in Scala: Spark is written in Scala, so getting familiar with Scala is a must.


Leave a Comment