Unable to run spark from jupyter note book


#1

On Running below set of code as instructed
""import os
import sys
os.environ[“SPARK_HOME”] = “/usr/hdp/current/spark2-client”
os.environ[“PYLIB”] = os.environ[“SPARK_HOME”] + “/python/lib”

In below two lines, use /usr/bin/python2.7 if you want to use Python 2

os.environ[“PYSPARK_PYTHON”] = “/usr/local/anaconda/bin/python”
os.environ[“PYSPARK_DRIVER_PYTHON”] = “/usr/local/anaconda/bin/python”
sys.path.insert(0, os.environ[“PYLIB”] +"/py4j-0.10.6-src.zip")
sys.path.insert(0, os.environ[“PYLIB”] +"/pyspark.zip")

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“appName”)
sc = SparkContext(conf=conf)
rdd = sc.textFile("/data/mr/wordcount/input/")
print(rdd.take(10))
sc.version"""

I get the following error

ModuleNotFoundError: No module named ‘py4j’


#2

Hi @Soutam,

Can you please check if py4j-0.10.6-src.zip exists in the path?


#3

Hi @abhinav
I am also getting the same error, not able to access pyspark form jupyter notebook.
It was working fine before the recent upgrade of the lab.
Could you please check.


#4

For the current version of Spark on the lab, the py4j file path is py4j-0.10.4-src.zip not py4j-0.10.6-src.zip