Pyspark application error

Hi,
I am trying to run a pyspark application in command line using spark-submit command. I received the following error.

Error from python worker:
** /usr/bin/python: No module named pyspark**
PYTHONPATH was:
** /hadoop/yarn/local/filecache/12/spark2-hdp-yarn-archive.tar.gz/spark-core_2.11-2.3.0.2.6.5.0-292.jar**
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:184)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:107)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:83)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:118)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:86)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Hi @abhinavsingh,
Please do look into this issue.

Thanks

1 Like

Hi Team,

please answer the above query ,while submitting pyspark job on yarn cluster it is throwing the same error ,i have tried setting conf and many more still not able to resolve.Please assist us promptly.

Thanks
shakti