Pyspark application error


#1

Hi,
I am trying to run a pyspark application in command line using spark-submit command. I received the following error.

Error from python worker:
** /usr/bin/python: No module named pyspark**
PYTHONPATH was:
** /hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.0.2.6.5.0-292.jar**
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:184)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:107)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:83)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:118)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


#2

Hi @abhinavsingh,
Please do look into this issue.

Thanks