Unable to run spark from jupyter note book


On Running below set of code as instructed
""import os
import sys
os.environ[“SPARK_HOME”] = “/usr/hdp/current/spark2-client”
os.environ[“PYLIB”] = os.environ[“SPARK_HOME”] + “/python/lib”

In below two lines, use /usr/bin/python2.7 if you want to use Python 2

os.environ[“PYSPARK_PYTHON”] = “/usr/local/anaconda/bin/python”
os.environ[“PYSPARK_DRIVER_PYTHON”] = “/usr/local/anaconda/bin/python”
sys.path.insert(0, os.environ[“PYLIB”] +"/py4j-0.10.6-src.zip")
sys.path.insert(0, os.environ[“PYLIB”] +"/pyspark.zip")

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“appName”)
sc = SparkContext(conf=conf)
rdd = sc.textFile("/data/mr/wordcount/input/")

I get the following error

ModuleNotFoundError: No module named ‘py4j’


Hi @Soutam,

Can you please check if py4j-0.10.6-src.zip exists in the path?


Hi @abhinav
I am also getting the same error, not able to access pyspark form jupyter notebook.
It was working fine before the recent upgrade of the lab.
Could you please check.


For the current version of Spark on the lab, the py4j file path is py4j-0.10.4-src.zip not py4j-0.10.6-src.zip