PySpark issues after migration


#1

Hi Team .

After the migration , I can see , non of my module is working

Spark got downgraded to 1.6 . Ran the same module that is shared in https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

So not able to import SparkSession as its supports from 2.0 and the current version is 1.6

Can you please help to get this setup for python 3 + Spark 2.3

import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
# from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("appName")
sc = SparkContext(conf=conf)
rdd = sc.textFile("/data/mr/wordcount/input/")
print(rdd.take(10))
sc.version

#2

Hi @Sachin_P

You can use Spark 2.3 from this directory /usr/spark2.3

Hope this helps.

Thanks


#3

Please check this: https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/
and this notebook: https://github.com/cloudxlab/bigdata/blob/master/spark/python/SparkStart.ipynb