Run Pyspark within Jupyter notebool?

Hi Sandeep,

As your instruction, I run the below commands in web console and can not start a Jupyter notebook to run pyspark.

“Start Jupyter using following commands:
export SPARK_HOME=”/usr/spark2.0.1/"

export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.3-src.zip:$SPARK_HOME/python/lib/pyspark.zip:$PYTHONPATH

export PATH=/usr/local/anaconda/bin:$PATH

jupyter notebook --no-browser --ip 0.0.0.0 --port 8888"

So I open another Jupyter Notebook and try to run the following codes and with some errors:
“from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“appName”)
sc = SparkContext(conf=conf)
rdd = sc.textFile(”/data/mr/wordcount/input/")
print rdd.take(10)
sc.version"

“ModuleNotFoundError: No module named ‘pyspark’”

How can I run Pyspark within Jupyter? thanks.

Hi Bintao,

Please select either of the two python environment notebook when you click on “New”:

  • Python [default]
  • Python [conda root]

pyspark-python

Hi Sandeep,

There is no any of the two python environment notebook when I click on “New”:

Python [default]
Python [conda root]

There is only Python 3 available in ‘New’.

Regards,
Bintao Li

@Bintao_Li,

Can you please try again & let me know as i can see those kernels when i’ve tried.

Hi @sgiri,

Requesting you to provide “Apache Toree - Python” Kernel for PySpark dynamically in JupyterHub instead of doing all these manual steps.

@raviteja,

Yes this is in the pipeline …Pyspark does not work with Python 3 and JupyterHub requires Python 3. We will have to create Python 2 environment for launching PySpark notebooks.

Hi @abhinav,

Please note that, Spark 1.5.2 supports

Spark 1.5.2 works with Python 2.6+ or Python 3.4+

Please refer this in Official spark documentation

Coming to Spark 2.3:

Spark 2.3.0 works with Python 2.7+ or Python 3.4+
http://spark.apache.org/docs/latest/rdd-programming-guide.html

So i think, we don’t need just Python 2, even Python 3 is officially supported.