PySpark issues after migration

Sachin_P · July 29, 2019, 6:54pm

Hi Team .

After the migration , I can see , non of my module is working

Spark got downgraded to 1.6 . Ran the same module that is shared in https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

So not able to import SparkSession as its supports from 2.0 and the current version is 1.6

Can you please help to get this setup for python 3 + Spark 2.3

import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.6-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
# from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("appName")
sc = SparkContext(conf=conf)
rdd = sc.textFile("/data/mr/wordcount/input/")
print(rdd.take(10))
sc.version

abhinav · July 30, 2019, 9:40am

Hi @Sachin_P

You can use Spark 2.3 from this directory /usr/spark2.3

Hope this helps.

Thanks

sgiri · August 6, 2019, 5:28pm

Please check this: https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/
and this notebook: https://github.com/cloudxlab/bigdata/blob/master/spark/python/SparkStart.ipynb