Posts by Year

2017

Blog Timeline

My blog timeline since it was first created in November 2016.

2015

Spark SQL Internals

This is the continuing post to my previous article Introduction to SparkSQL, intending to understand SparkSQL on a deeper level.

Understand the Spark Deployment Modes

Spark deployment modes Besides running Spark application in local mode (used only for testing), spark applications can run in different cluster managers: ...

2014

Introduction to Spark SQL

With Spark and RDD core API, we can do almost everything with datasets. Developers define the steps of how to retrieve the data by applying functional transf...

How can two applications share RDDs

Problem The application isolation in current Spark’s architecture results in the impossibility of sharing data (mostly RDDs) between different applications w...