Error using AWS S3 with pyspark

Rafael_Rodrigues · December 25, 2020, 6:57pm

Hi there

New here, trying to fit using the lab.

I am trying to load some data from S3 using pyspark:

path = ‘s3://nyc-tlc/trip data/yellow_tripdata_2014-04.csv’
df = sparksession.read.options(header=‘True’,inferSchema=‘True’).csv(path)

and I get this error:

Py4JJavaError: An error occurred while calling o83.csv.
: java.io.IOException: No FileSystem for scheme: s3

Can anyone help me?

sandeepgiri · December 26, 2020, 1:19pm

Hi Rafel,

Could you try it on CloudxLab console the following way?

pyspark --packages=org.apache.hadoop:hadoop-aws:2.7.3