COALESCE, REPARTITION, and REPARTITION_BY_RANGE

Amit_Padhi · November 3, 2020, 4:25am

Please explain the concept when why and where to use this

COALESCE , REPARTITION , and REPARTITION_BY_RANGE

note : – i checked the apache spark documentations for this but did not get completely

sgiri · November 3, 2020, 7:26am

coalesce

Coalesce litterally means come together. It is in SQL as well as SparkSQL means the same.
In case of Spark SQL, it gives the first non-null value.

You can test by first launching spark-shell and running the following command

spark.sql("SELECT coalesce(NULL, 1, NULL) col").show()

This query SELECT coalesce(NULL, 1, NULL) basically displays the first non-null i.e. 1.

Amit_Padhi · November 4, 2020, 2:26am

My query is not what is Coalesce this is known is SQL ; here question is about performance tuning section of SPARK SQL ; why and when we go for repartitioning and
REPARTITION_BY_RANGE ; as in your lecture we studied about partitioning about RDD ; but here i am not able to understand

this is the below link
https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html

Prakhar_Katiyar · November 5, 2020, 10:45am

I have found an article which might help. Please check it out.