Understand the Spark Deployment Modes

Published:

Categories:

Spark deployment modes

Besides running Spark application in local mode (used only for testing), spark applications can run in different cluster managers:

  • Standalone cluster: we can very quickly deploy a standalone cluster with minimal few steps and configurations. I wrote the tutorial here: How to install Apache Spark 1.2.1 as Standalone cluster
  • ApacheMesos : A cluster manager developed by UC Berkeley
  • HadoopYARN: A popular cluster manager - came with Hadoop MapReduce version 2

alt
Running applications in Cluster mode

In overview, the process of running an application on Spark is as follows:

  • Firstly, the user submits an application using bin/spark-submit command, providing:
  • --master: url to the cluster manager: YARN/ MESOS/ Standalone Cluster
  • --deploy-mode: cluster or client: The option deploy-mode specifies where the Driver Program will be run: either on the Worker Node inside the cluster (cluster mode) (ApplicationMaster in YARN) or on an external client (client mode)
  • -application-jar: The source code of Driver Program

  • The Driver Program runs and creates a SparkContext. Next, it registers the application to the ClusterManager. The DriverProgram with pre-configured parameters will then ask ClusterManager for resources, which is a list of Executors. ClusterManager is in charge of allocating available resources for each application submitted. Finally, DriverProgram sends application code to the allocated Executors to run (Tasks).
  • You can notice easily that different Applications run in different Executors (or we can say Tasks from different Applications run on different JVMs).
  • So, within an application, DriverProgram schedules Jobs and Tasks. It communicates with its Executors to obtain status about the Tasks and Jobs. What I mean “schedule” here is: to run Jobs and Tasks in order, and re-submit if the Tasks fail. The results of running Tasks could be sent back to DriverProgram (eg rdd.count(), rdd.collect(), …) or written to storage (eg rdd.saveAsTextFile())

Some terminologies:

  • A WorkerNode is a node machine that host one or multiple Workers.
  • A Worker is launched within a single JVM (1 process) and has access to a configured number of resources in the node machine, eg number of used cores per Worker, RAM memory. A Worker can spawn and own multiple Executors.
  • An Executor is also launched within a single JVM (1 process), created by the Worker, and in charge of running multiple Tasks in a ThreadPool.

In more details:

  • Using SparkContext (sc), our application will be able to create some RDDs, apply some transformations on them and then trigger an action. Subsequently, sc will immediately submit a Job to the DAGScheduler of the Driver. The DAGScheduler will apply some optimizations here. One noticeable thing is the process of splitting the DAG (Directed Acyclic Graph) into Stages (set of Tasks) by analyzing the types of dependency of RDDs (See previous post here - pipeline narrow dependencies). The DAGScheduler will decide what to run: the Tasks including stages and partitions.
  • Tasks are then scheduled to the acquired Executors driver side TaskScheduler, according to resources and locality constraints (decides where to run each Task).
  • Within a Task, the lineage DAG of corresponding stage is serialized together with closures (functions) of transformation, then sent to and executed on the scheduled Executors.
  • The Executors run the Tasks in multiple Threads and finally send back the results to the Driver to aggregate.

alt Master - Workers

Refers to the Spark-core 1.3 overall architecture for more details

Noticeable characteristics

Compared with Hadoop MapReduce

Hadoop MapReduce runs each Task in its own process; and when a Task completes, that process goes away (unless we turned on the JVM reuse). Additionally, these Tasks can run concurrently in threads by using the advanced techniques with using class MultithreadedMapper.

While in Spark, many Tasks by default will run concurrently in multiple threads in a single process (an Executor). This process sticks to the Application, even when no jobs are running.

In fact, we can summarize the difference in the execution model between them:

  • Each MapReduce Executor is short-lived and expected to run only one large task.
  • Each Spark Executor is long-running and expected to run many small tasks.
  • Thus, be careful when ports source code from MapReduce to Spark because Spark requires thread safe execution.

So, what are the benefits of the implementation in Spark?

  • Able to isolate different applications to run in different JVMs, isolate the tasks scheduling: each driver schedules its own tasks 1.
  • Launching a new process cost much more than launching a new thread. Creating threads is 10-100 times faster than creating processes. Additionally, by running multiple small Tasks rather than just a few big Tasks would reduce the impacts of data skew.
  • Very suitable for running interactive application: quick computation

Besides these gains, however, there are also adverse impacts:

  • Applications cannot share data (mostly Rdds in sparkContext) without writing to external storage 1.
  • Resources allocation inefficiency: for the entirety of an Application, the App will take a full and fixed amount of resources that it asked the ClusterManager and got accepted, even though that App doesn’t run any Task.
  • The dynamic resource allocation feature comes out in Spark 1.2 to address this issue: If a subset of the resources allocated to an application becomes idle, it can be returned to the cluster’s pool of resources and acquired by other applications. 2
  1. https://spark.apache.org/docs/1.3.0/cluster-overview.html  2

  2. http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation 

Leave a Comment