Sir, i am unable to do spark-submit, i was getting the error not executed it completely

[prassadht2483@cxln5 ~]$ cat s_b1.py
import sys

from pyspark import SparkContext

if name == “main”:

    sc = SparkContext("local", "word count")
    sc.setLogLevel("ERROR")

    lines = sc.textFile("file.txt")

    words = lines.flatMap(lambda s: s.split(" "))


    wordCounts = words.countByValue()
    for word, count in wordCounts.items():
            print("{} : {}".format(word, count))

[prassadht2483@cxln5 ~]$ spark-submit s_b1.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
21/02/11 03:11:22 INFO SparkContext: Running Spark version 2.1.1.2.6.2.0-205
21/02/11 03:11:22 INFO SecurityManager: Changing view acls to: prassadht2483
21/02/11 03:11:23 INFO SecurityManager: Changing modify acls to: prassadht2483
21/02/11 03:11:23 INFO SecurityManager: Changing view acls groups to:
21/02/11 03:11:23 INFO SecurityManager: Changing modify acls groups to:
21/02/11 03:11:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(prassadht2483); groups with view permissions: Set(); users with modify permissions: Set(prassadht2483); groups with modify permissions: Set()
21/02/11 03:11:23 INFO Utils: Successfully started service ‘sparkDriver’ on port 40527.
21/02/11 03:11:23 INFO SparkEnv: Registering MapOutputTracker
21/02/11 03:11:23 INFO SparkEnv: Registering BlockManagerMaster
21/02/11 03:11:23 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/02/11 03:11:23 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/02/11 03:11:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ad25ba0e-91d4-4244-8510-30a45e30c8ff
21/02/11 03:11:23 INFO MemoryStore: MemoryStore started with capacity 93.3 MB
21/02/11 03:11:23 INFO SparkEnv: Registering OutputCommitCoordinator
21/02/11 03:11:23 INFO Utils: Successfully started service ‘SparkUI’ on port 4040.
21/02/11 03:11:23 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.142.0.5:4040
21/02/11 03:11:23 INFO SparkContext: Added file file:/home/prassadht2483/s_b1.py at file:/home/prassadht2483/s_b1.py with timestamp 1613013083976
21/02/11 03:11:23 INFO Utils: Copying /home/prassadht2483/s_b1.py to /tmp/spark-3b965f23-dc8b-4734-8ece-086e9e86c79c/userFiles-c9785ad1-dd00-4161-950e-ede990557125/s_b1.py
21/02/11 03:11:24 INFO Executor: Starting executor ID driver on host localhost
21/02/11 03:11:24 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 43874.
21/02/11 03:11:24 INFO NettyBlockTransferService: Server created on 10.142.0.5:43874
21/02/11 03:11:24 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/02/11 03:11:24 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.142.0.5, 43874, None)
21/02/11 03:11:24 INFO BlockManagerMasterEndpoint: Registering block manager 10.142.0.5:43874 with 93.3 MB RAM, BlockManagerId(driver, 10.142.0.5, 43874, None)
21/02/11 03:11:24 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.142.0.5, 43874, None)
21/02/11 03:11:24 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.142.0.5, 43874, None)
21/02/11 03:11:24 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1613013084023
Traceback (most recent call last):
File “/home/prassadht2483/s_b1.py”, line 16, in
wordCounts = words.countByValue()
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 1246, in countByValue
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 834, in reduce
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 808, in collect
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cxln1.c.thelab-240901.internal:8020/user/prassadht2483/file.txt
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1968)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

[prassadht2483@cxln5 ~]$ nano newss.py
[prassadht2483@cxln5 ~]$ cat newss.py
from pyspark import SparkContext, SparkConf

if name == “main”:
conf = SparkConf().setAppName(“word count”).setMaster(“local”)
sc = SparkContext(conf = conf)

lines = sc.textFile("file.text")

words = lines.flatMap(lambda line: line.split(" "))

wordCounts = words.countByValue()

for word, count in wordCounts.items():
    print("{} : {}".format(word, count))

[prassadht2483@cxln5 ~]$ spark-submit newss.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
21/02/11 04:00:28 INFO SparkContext: Running Spark version 2.1.1.2.6.2.0-205
21/02/11 04:00:28 INFO SecurityManager: Changing view acls to: prassadht2483
21/02/11 04:00:28 INFO SecurityManager: Changing modify acls to: prassadht2483
21/02/11 04:00:28 INFO SecurityManager: Changing view acls groups to:
21/02/11 04:00:28 INFO SecurityManager: Changing modify acls groups to:
21/02/11 04:00:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(prassadht2483); groups with view permissions: Set(); users with modify permissions: Set(prassadht2483); groups with modify permissions: Set()
21/02/11 04:00:29 INFO Utils: Successfully started service ‘sparkDriver’ on port 45996.
21/02/11 04:00:29 INFO SparkEnv: Registering MapOutputTracker
21/02/11 04:00:29 INFO SparkEnv: Registering BlockManagerMaster
21/02/11 04:00:29 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/02/11 04:00:29 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/02/11 04:00:29 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b494cbe6-04d5-407c-b9fe-6c6c059594ea
21/02/11 04:00:29 INFO MemoryStore: MemoryStore started with capacity 93.3 MB
21/02/11 04:00:29 INFO SparkEnv: Registering OutputCommitCoordinator
21/02/11 04:00:29 INFO Utils: Successfully started service ‘SparkUI’ on port 4040.
21/02/11 04:00:29 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.142.0.5:4040
21/02/11 04:00:29 INFO SparkContext: Added file file:/home/prassadht2483/newss.py at file:/home/prassadht2483/newss.py with timestamp 1613016029692
21/02/11 04:00:29 INFO Utils: Copying /home/prassadht2483/newss.py to /tmp/spark-e69e66b8-ec0b-4d75-87f3-99318c02781a/userFiles-9cf65d11-64f6-42b3-8dd3-01422e75e67e/newss.py
21/02/11 04:00:29 INFO Executor: Starting executor ID driver on host localhost
21/02/11 04:00:29 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 46675.
21/02/11 04:00:29 INFO NettyBlockTransferService: Server created on 10.142.0.5:46675
21/02/11 04:00:29 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/02/11 04:00:29 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.142.0.5, 46675, None)
21/02/11 04:00:29 INFO BlockManagerMasterEndpoint: Registering block manager 10.142.0.5:46675 with 93.3 MB RAM, BlockManagerId(driver, 10.142.0.5, 46675, None)
21/02/11 04:00:29 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.142.0.5, 46675, None)
21/02/11 04:00:29 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.142.0.5, 46675, None)
21/02/11 04:00:30 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1613016029740
21/02/11 04:00:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 357.0 KB, free 93.0 MB)
21/02/11 04:00:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 31.8 KB, free 92.9 MB)
21/02/11 04:00:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.142.0.5:46675 (size: 31.8 KB, free: 93.3 MB)
21/02/11 04:00:31 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
Traceback (most recent call last):
File “/home/prassadht2483/newss.py”, line 11, in
wordCounts = words.countByValue()
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 1246, in countByValue
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 834, in reduce
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 808, in collect
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cxln1.c.thelab-240901.internal:8020/user/prassadht2483/file.text
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1968)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
21/02/11 04:00:31 INFO SparkContext: Invoking stop() from shutdown hook
21/02/11 04:00:31 INFO SparkUI: Stopped Spark web UI at http://10.142.0.5:4040
21/02/11 04:00:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/02/11 04:00:31 INFO MemoryStore: MemoryStore cleared
21/02/11 04:00:31 INFO BlockManager: BlockManager stopped
21/02/11 04:00:31 INFO BlockManagerMaster: BlockManagerMaster stopped
21/02/11 04:00:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/02/11 04:00:31 INFO SparkContext: Successfully stopped SparkContext
21/02/11 04:00:31 INFO ShutdownHookManager: Shutdown hook called
21/02/11 04:00:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-e69e66b8-ec0b-4d75-87f3-99318c02781a
21/02/11 04:00:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-e69e66b8-ec0b-4d75-87f3-99318c02781a/pyspark-f51c408b-8f28-461e-9e05-dfc95
3f7756d

in the above case also happened the same…
so please guide me where i went wrong…

The main error is: Input path does not exist: hdfs://cxln1.c.thelab-240901.internal:8020/user/prassadht2483/file.text

It is searching for file.txt in your HDFS directory. Pl upload it in HDFS using ‘Hadoop fs -put’

thanks sir…

[prassadht2483@cxln5 ~]$ cat newss.py
from pyspark import SparkContext, SparkConf
if name == “main”:
conf = SparkConf().setAppName(“word count”).setMaster(“local”)
sc = SparkContext(conf = conf)

lines = sc.textFile("file.text")

words = lines.flatMap(lambda line: line.split(" "))

wordCounts = words.countByValue()

for word, count in wordCounts.items():
    print("{} : {}".format(word, count))

with the above newss.py code i am getting error though i copied the file in hdfs
please tell me what is wrong in the code

[prassadht2483@cxln5 ~]$ cat newss.py
from pyspark import SparkContext, SparkConf
if name == “main”:
conf = SparkConf().setAppName(“word count”).setMaster(“local”)
sc = SparkContext(conf = conf)

lines = sc.textFile("file.text")

words = lines.flatMap(lambda line: line.split(" "))

wordCounts = words.countByValue()

for word, count in wordCounts.items():
    print("{} : {}".format(word, count))

[prassadht2483@cxln5 ~]$ spark-submit newss.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
21/02/11 04:50:11 INFO SparkContext: Running Spark version 2.1.1.2.6.2.0-205
21/02/11 04:50:12 INFO SecurityManager: Changing view acls to: prassadht2483
21/02/11 04:50:12 INFO SecurityManager: Changing modify acls to: prassadht2483
21/02/11 04:50:12 INFO SecurityManager: Changing view acls groups to:
21/02/11 04:50:12 INFO SecurityManager: Changing modify acls groups to:
21/02/11 04:50:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(prassadht2483)
; groups with view permissions: Set(); users with modify permissions: Set(prassadht2483); groups with modify permissions: Set()
21/02/11 04:50:12 INFO Utils: Successfully started service ‘sparkDriver’ on port 35846.
21/02/11 04:50:12 INFO SparkEnv: Registering MapOutputTracker
21/02/11 04:50:12 INFO SparkEnv: Registering BlockManagerMaster
21/02/11 04:50:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/02/11 04:50:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/02/11 04:50:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-15540af9-39d9-4d80-94ec-a97cafe51f04
21/02/11 04:50:12 INFO MemoryStore: MemoryStore started with capacity 93.3 MB
21/02/11 04:50:12 INFO SparkEnv: Registering OutputCommitCoordinator
21/02/11 04:50:12 INFO Utils: Successfully started service ‘SparkUI’ on port 4040.
21/02/11 04:50:12 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.142.0.5:4040
21/02/11 04:50:13 INFO SparkContext: Added file file:/home/prassadht2483/newss.py at file:/home/prassadht2483/newss.py with timestamp 1613019013079
21/02/11 04:50:13 INFO Utils: Copying /home/prassadht2483/newss.py to /tmp/spark-5cd55c3a-61f1-4cc6-a2db-2a2428feb17d/userFiles-cdfc7306-c482-4ee7-a
a75-17464c0b4922/newss.py
21/02/11 04:50:13 INFO Executor: Starting executor ID driver on host localhost
21/02/11 04:50:13 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 39710.
21/02/11 04:50:13 INFO NettyBlockTransferService: Server created on 10.142.0.5:39710
21/02/11 04:50:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/02/11 04:50:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.142.0.5, 39710, None)
21/02/11 04:50:13 INFO BlockManagerMasterEndpoint: Registering block manager 10.142.0.5:39710 with 93.3 MB RAM, BlockManagerId(driver, 10.142.0.5, 39710, None)
21/02/11 04:50:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.142.0.5, 39710, None)
21/02/11 04:50:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.142.0.5, 39710, None)
21/02/11 04:50:13 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1613019013123
21/02/11 04:50:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 357.0 KB, free 93.0 MB)
21/02/11 04:50:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 31.8 KB, free 92.9 MB)
21/02/11 04:50:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.142.0.5:39710 (size: 31.8 KB, free: 93.3 MB)
21/02/11 04:50:14 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
Traceback (most recent call last):
File “/home/prassadht2483/newss.py”, line 11, in
wordCounts = words.countByValue()
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 1246, in countByValue
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 834, in reduce
File “/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/rdd.py”, line 808, in collect
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://cxln1.c.thelab-240901.internal:8020/user/prassadht2483/file.text
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.api.python.PythonRDD.getPartitions(PythonRDD.scala:53)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1968)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

21/02/11 04:50:14 INFO SparkContext: Invoking stop() from shutdown hook
21/02/11 04:50:14 INFO SparkUI: Stopped Spark web UI at http://10.142.0.5:4040
21/02/11 04:50:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/02/11 04:50:14 INFO MemoryStore: MemoryStore cleared
21/02/11 04:50:14 INFO BlockManager: BlockManager stopped
21/02/11 04:50:14 INFO BlockManagerMaster: BlockManagerMaster stopped
21/02/11 04:50:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/02/11 04:50:14 INFO SparkContext: Successfully stopped SparkContext
21/02/11 04:50:14 INFO ShutdownHookManager: Shutdown hook called
21/02/11 04:50:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-5cd55c3a-61f1-4cc6-a2db-2a2428feb17d
21/02/11 04:50:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-5cd55c3a-61f1-4cc6-a2db-2a2428feb17d/pyspark-840c2618-8c06-4ba6-b09b-25a58
1f55791

with the above code i am getting error please…

Please note that the error is still the same. Can you check if the file ‘file.text’ exists in your home directory in HDFS?

Dear Admin please rejuvenate the jupyter server as it shows ERROR 500

Your storage could be beyond the quota. Please check https://cloudxlab.com/faq/6/what-are-the-limits-on-the-usage-of-lab-or-what-is-the-fair-usage-policy-fup