Unable to launch pyspark on console

saritha_reddy · October 2, 2020, 3:24pm

[sarithadsr217850@cxln5 ~]$ export PATH=/usr/local/anaconda/bin:$PATH
[sarithadsr217850@cxln5 ~]$ pyspark
SPARK_MAJOR_VERSION is set to 2, using Spark2
File “/bin/hdp-select”, line 232
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print("ERROR: Invalid package - " + name)?
Fatal Python error: Py_Initialize: can’t initialize sys standard streams
Traceback (most recent call last):
File “/usr/local/anaconda/lib/python3.6/io.py”, line 52, in
File “/home/sarithadsr217850/abc.py”, line 2, in
File “/usr/hdp/current/spark2-client/python/pyspark/init.py”, line 40, in
File “/usr/local/anaconda/lib/python3.6/functools.py”, line 20, in
ImportError: cannot import name ‘get_cache_token’
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Fatal Python error: Py_Initialize: can’t initialize sys standard streams
Traceback (most recent call last):
File “/usr/local/anaconda/lib/python3.6/io.py”, line 52, in
File “/home/sarithadsr217850/abc.py”, line 2, in
File “/usr/hdp/current/spark2-client/python/pyspark/init.py”, line 40, in
File “/usr/local/anaconda/lib/python3.6/functools.py”, line 20, in
ImportError: cannot import name ‘get_cache_token’
Aborted

satyajit_das · October 3, 2020, 8:06am

Hi, Saritha.

Can you check it now? pyspark is running perfectly.
Kindly ignore the error message, occurring due to the some bin files running at the backend.

Just run the last 6 lines of the Pyspark from the below article.

All the other environment variables are already been set.

All the best!

saritha_reddy · October 15, 2020, 3:08pm

Spark Streaming’s Kafka libraries not found in class path. Try one of the following.

Include the Kafka library and its dependencies with in the
spark-submit command as

$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.4.3 …
Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.3.
Then, include the jar in the spark-submit command as

$ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> …
TypeError Traceback (most recent call last)
in
12 ssc = StreamingContext(sc, 5)
13
—> 14 lines = KafkaUtils.createStream(ssc, ‘localhost:2181’, “spark-streaming-consumer”, {‘saritha_kafka_test’:1})
15
16 # Split each line in each batch into words

/usr/spark2.4.3/python/lib/pyspark.zip/pyspark/streaming/kafka.py in createStream(ssc, zkQuorum, groupId, topics, kafkaParams, storageLevel, keyDecoder, valueDecoder)
76 raise TypeError(“topics should be dict”)
77 jlevel = ssc._sc._getJavaStorageLevel(storageLevel)
—> 78 helper = KafkaUtils._get_helper(ssc._sc)
79 jstream = helper.createStream(ssc._jssc, kafkaParams, topics, jlevel)
80 ser = PairDeserializer(NoOpSerializer(), NoOpSerializer())

/usr/spark2.4.3/python/lib/pyspark.zip/pyspark/streaming/kafka.py in _get_helper(sc)
215 def _get_helper(sc):
216 try:
–> 217 return sc._jvm.org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper()
218 except TypeError as e:
219 if str(e) == “‘JavaPackage’ object is not callable”:

TypeError: ‘JavaPackage’ object is not callable

saritha_reddy · October 15, 2020, 3:18pm

Jupyter notebook is working but in CLI pyspark not working.

sgiri · October 16, 2020, 2:11am

Hi Saritha,

The pyspark doesn’t work with Python 3. So, while running pyspark, you don’t need to “export PATH=/usr/local/anaconda/bin:$PATH” because that one make python3 default.

sgiri · October 16, 2020, 2:13am

Check this for spark streaming: https://github.com/cloudxlab/bigdata/tree/master/spark/examples/streaming/word_count_kafka

How to make a spark-submit program using scala: https://github.com/cloudxlab/bigdata/tree/master/spark/examples/streaming/word_count_sbt