Apache Spark Modules
As you can see, the module spark-core is the foundation framework for all the others. This module provides the implementations for spark computing engine: rdd, schedule, deploy, executor, storage, shuffle, …
spark-catalyst lets you query structured data as a distributed dataset by using SQL queries. The module
spark-hive provides the capability of interacting with hive, and the module
spark-catalyst is used as a query optimization framework for spark.
spark-lib is a scalable machine learning library leveraging the power of computing of spark.
spark-lib can even run on streaming data or use sql-queries to extract.
spark-graphx make it easy to build scalable fault-tolerant streaming applications and graph-parallel computation, respectively.