Error while looking for metadata directory

kkraj · June 12, 2020, 7:15am

textdata = spark.read.text("/user/kkgprest27288/myfirstfile.txt")
when I tried to run above command. I am getting following error message
20/06/12 07:10:53 WARN DataSource: Error while looking for metadata directory.

abhinav · June 16, 2020, 4:24am

Hi @kkraj,

We have provided the instructions in the video to access hive in Spark. In short, you have to provide hive.xml environment variable to Spark. Can you please follow those configurations as suggested in the course video.

kkraj · June 17, 2020, 2:55am

Abhinav,

Which video are you referring ? Please let me know video link

Sanchit_Khandelwal · October 6, 2022, 7:19pm

@abhinav , @Shubh_Tripathi , @sandeepgiri - I am facing the same issue. Could you please provide the link to the video or instructions in this message chain.

sandeepgiri · October 6, 2022, 8:36pm

Sanchit,

Could you share the detailed steps to reproduce this error?

Sanchit_Khandelwal · October 6, 2022, 9:03pm

launch pyspark shell then below code -

df_orders = spark.readStream.format(“csv”)
.option(“path”,“input_folder”)
.option(“header”,“true”)
.option(“inferSchema”,“true”)
.load()

df_orders.createOrReplaceTempView(“df_orders_file”)

df_all_orders = spark.sql(“select * from df_orders_file”)

order_stream = df_all_orders.writeStream
.format(“csv”)
.outputMode(“append”)
.option(“checkpointLocation”,“checkpoint-location10”)
.option(“path”,“output_folder”)
.start()

Sanchit_Khandelwal · October 7, 2022, 4:55pm

Any update here @sandeepgiri ?

Sanchit_Khandelwal · October 8, 2022, 3:47pm

Any update @sandeepgiri @Shubh_Tripathi

sandeepgiri · October 10, 2022, 9:11am

Hi Sanchit,

try this:

df_orders = spark.read.format('csv').option('path','input_folder').option('header','true').option('inferSchema','true').load()

Also, is there a particular reason of using readStream?
Note:- You can get more knowledge about these concepts at My courses