Created a txt file and trying to read/access it in pyspark but it is throwing error as ‘Input file path does not exist’. Please help.
can you share the code?
Hi @sandeepgiri ,
This is my code, its not working. it looks like server is down.
dataschema = StructType([StructField(“transaction_id”,IntegerType(),True) ,
StructField(“customer_id”,IntegerType(),True) ,
StructField(“product”,StringType(),True) ,
StructField(“category”,StringType(),True) ,
StructField(“quantity”,IntegerType(),True) ,
StructField(“price”,DecimalType(10, 2),True) ,
StructField(“city”,StringType(),True) ,
StructField(“event_time”,StringType(),True)
])
Filepath =“hdfs://ip-172-31-53-48.ec2.internal:8020/user/antony343357464/sales.json/”
StreamingInputDF = (
spark
.readStream
.schema(dataschema)
.option(“maxFilesPerTrigger”,1)
.json(Filepath)
)
StreamingInputDF.show()
Error : - IllegalArgumentException: java.net.UnknownHostException: ip-172-31-53-48.ec2.internal