Unable to run spark

Amit_Padhi · October 13, 2020, 5:40am

I am getting error while running in my local machine.

i am running from jupyter notebook ; i have set the environment variables for SPARK and in hadoop i have kept winutils.exe ; but it is throwing exception.

below my code:

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“testdata”).setMaster(“local”)
sc = SparkContext(conf=conf)
lines = sc.textFile(“file:///D:/sparkexample”)
ratings = lines.map(lambda x: x.split()[2])
result = ratings.countByValue()

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741701:

please let me know how to overcome

sgiri · October 16, 2020, 2:36am

It looks like an installation issue.

Please try the following:

try to print lines as soon as you create the RDD.

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName(“testdata”).setMaster(“local”)
sc = SparkContext(conf=conf)
lines = sc.textFile(“file:///D:/sparkexample”)
Try re-installing spark
Please install spark on Linux inside a VirtualBox or some other virtualization on windows instead of running spark on windows. Though spark works on windows but in most of the server environments you will find that the spark is run on Linux.
Please use CloudxLab or any other cloud-hosted environment