Posts by Category

Jupyter

Scala

Spark

Spark SQL Internals

5 minute read

This is the continuing post to my previous article Introduction to SparkSQL, intending to understand SparkSQL on a deeper level.

Understand the Spark Deployment Modes

5 minute read

Spark deployment modes Besides running Spark application in local mode (used only for testing), spark applications can run in different cluster managers: ...

Top reasons why you should shift to spark

less than 1 minute read

Fast, in-memory (100x faster) or disk (2-10x faster). See Daytona GraySort contest and Official Result. Usability: rich APIs (Scala, Java, Python), conc...

Introduction to Spark SQL

3 minute read

With Spark and RDD core API, we can do almost everything with datasets. Developers define the steps of how to retrieve the data by applying functional transf...

How can two applications share RDDs

1 minute read

Problem The application isolation in current Spark’s architecture results in the impossibility of sharing data (mostly RDDs) between different applications w...

Sublime Text

Terminal

Uncategorized

Blog Timeline

1 minute read

My blog timeline since it was first created in November 2016.