From the course: Apache Spark Essential Training: Big Data Engineering

Unlock the full course today

Join today to access over 23,400 courses taught by industry experts.

Spark execution plan

Spark execution plan

- Spark execution plans play an important role in optimizing pipelines. When a job is submitted to Apache Spark, it first analyzes all the code given to it and comes up with an execution plan. Spark has an optimizer that analyzes the steps needed to process data and optimizes for performance and resource utilization. Spark only executes code when an action like reduce or collect is performed. At this point, the optimizer kicks in and analyzes all the previous steps required to achieve this action. It then comes up with a physical execution plan. The optimizer looks forward using IO, shuffling, and memory usage. If the data sources can support parallel IO, then Spark accesses them directly from the executer and paralyzes these operations. This provides improved performance and reduces memory requirements on the driver. It is recommended to print and analyze execution plans to understand what Spark is doing underneath…

Contents