Error using AWS S3 with pyspark

Hi there

New here, trying to fit using the lab.

I am trying to load some data from S3 using pyspark:

path = ‘s3://nyc-tlc/trip data/yellow_tripdata_2014-04.csv’
df = sparksession.read.options(header=‘True’,inferSchema=‘True’).csv(path)

and I get this error:

Py4JJavaError: An error occurred while calling o83.csv.
: java.io.IOException: No FileSystem for scheme: s3

Can anyone help me?

Hi Rafel,

Could you try it on CloudxLab console the following way?

pyspark --packages=org.apache.hadoop:hadoop-aws:2.7.3