Role of num-executors in Spark

I am running Spark in Microsoft Azure. The cluster size I configured in Azure is -
6 Nodes (2 HEAD + 4 WORKER) 40 CORES.

In my spark submit job, I give -

spark-submit --master=yarn --num-executors=9 $script $task2_outdir $task3_outdir

Question - What is the role of num-executors? How does it work with these worker nodes when an application job is submitted?

–num-executors signifies the maximum number of machines/containers that will be involved in the processing.

If you are processing an RDD, the partitions of an RDD are processed in parallel on multiple machines.
So, num_executors * cores per executors are the maximum number of partitions that can be processed in parallel.

Hope this helps.

1 Like