PySpark issues after migration


Hi Team .

After the migration , I can see , non of my module is working

Spark got downgraded to 1.6 . Ran the same module that is shared in

So not able to import SparkSession as its supports from 2.0 and the current version is 1.6

Can you please help to get this setup for python 3 + Spark 2.3

import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/")
sys.path.insert(0, os.environ["PYLIB"] +"/")
# from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("appName")
sc = SparkContext(conf=conf)
rdd = sc.textFile("/data/mr/wordcount/input/")


Hi @Sachin_P

You can use Spark 2.3 from this directory /usr/spark2.3

Hope this helps.



Please check this:
and this notebook: