Jupyter notebook running old version of Spark

amalprakash · October 21, 2019, 1:16am

Jupyter notebook with the Scala Kernel runs very old version of spark (v2.1.1). This is causing most of the latest Dataframe and Dataset operations that were introduced after v2.1 to fail .
Is it possible to update the Spark version used by the Jupyter notebook to 2.3 or 2.4 ?

sgiri · October 21, 2019, 4:01am

Though, in python jupyter notebook, you can use any version you wish to. See: https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

But in case of scala, it is not possible to switch the scala version as per my understanding. If you have some ideas, please let me know.

Alternatively, you can use the spark-shell or may be launch your own jupyter notebook server at CloudxLab.

I will try to change the version in scala and see if it can run in notebook.

amalprakash · October 22, 2019, 4:01pm

Thanks for the prompt reply @sgiri . The Apache Toree kernel configurations are located in the file.
/usr/local/share/jupyter/kernels/apache_toree_scala/kernel.json

Changing the line “SPARK_HOME”: “/usr/hdp/current/spark2-client”, to point to either Spark 2.3 or 2.4 located at /usr/spark2.x should change the kernel version in Jupyter notebook I guess.