Spark SQL - SparkSession Error

Hi,

I am trying to execute a simple Spark SQL code (PySpark) in CloudxLab using Spark-Submit. I logged in to e.cloudxlab.com web console and navigated to /usr/spark2.0.1/bin/ to execute the following command but received the below error.

spark-submit HousePriceSolution.py

Error:
from pyspark.sql import SparkSession
ImportError: cannot import name SparkSession

It looks like the issue is due to the code is executed still under Spark-Submit 1.x version. May i know how to navigate to Spark-Submit of spark 2.0?

These are the versions of spark that are available on the cluster.

[uutkarsh@ip-172-31-20-58 ~]$ ls -la /usr/spark*

spark1.2.1/

spark1.6/

spark2.0.1/

spark2.0.2/

spark2.2.1/

They can be accessed at the location
/usr/spark<< versionnumber>>
Clearly it means that 2.0 is not installed. You can check whether it works with other versions.
Thanks.

1 Like

You can also go about solving your problem in this way.
The spark-submit that is executing your code is version 1.5.2 which is because that is present on the PATH variable (if you are using UNIX)

Rather than writing the command straight away you can type
/usr/spark2.2.1/bin/spark-submit YOUR_FILE_ARGUMENT
on the command line
This would use the spark-submit 2.2.1 version to execute
Hope that helps
Thank.

1 Like

Hi Utkarsh,

I did exactly the same but see below the result. It still goes to 1.5.2

read my second answer . I can access the 2.2.1 version of the scla-submit

YOUR PROBLEM
When you are typing scala-submit directly on the prompt it is picking up the 1.5.2 as that is the one that is mentioned in the $PATH variable.

you need to type the following command to GET YOUR ANSEWER
/usr/spark2.2.1/bin/spark-submit "THE_FILENAME_THAT_YOU_WANT_TO_PROCESS"
This would send the execution to teh spark2.2.1 part of the new spark
Thanks
TELL ME IF THIS WORKED.

1 Like

Thats right… it picks up… thanks a lot

1 Like