Permission denied

Not able to create parquet file, due to permission denied error.
Currently i have given permission “rwxrwxrwx” to folders in HDFS.

Could you share the code? Chances are that you are accessing a file which you don’t have permission to.

DF.write.mode(“Overwrite”).parquet(“home/username/”)

Please respond to respective query

DF.write.mode(“Overwrite”).parquet(“home/username/”)

When the filename does not start with “/” it is called a relative path else if it started with “/” it is absolute.
If it is relative, it tries to search the file or folder in the home directory of user.
So “home/usename” is equal to /home/sandeepgiri/home/username if my home directory is /home/sandeepgiri.

I would strongly suggest going thru linux basics: https://cloudxlab.com/assessment/playlist-intro/2/linux-basics

Thanks for quick response.

But error here I am facing is “Permission denied” not “directory not found”.

And I tried all other combinations such as with slash, without slash…etc

i have tried these combinations, it is not giving exception but parquet file is not being generated in any of the mentioned path.

Can you try the following:

newdf.write.mode("overwrite").parquet("mynewdf");

Also “./home/gautam” and “home/gautam” are same as “/home/yourlogin/home/gautam”.

Again, I would suggest that understand the difference between the relative and absolute paths and the meaning of “.”, “/” and “./”.

Try experimenting with files/folders in linux using the following set of commands:

mkdir
cd
pwd
ls
touch

Not working. Tried all kind of paths

Please share full code and the steps. Let me look into it.

import org.apache.spark.sql.SQLContext
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val dataframe_mysql = sqlcontext.read.format(“jdbc”).option(“url”, “jdbc:mysql://ip-172-31-20-247/retail_db”).option(“driver”, “com.mysql.jdbc.Driver”).option(“dbtable”, “uber”).option(“user”, “sqoopuser”).option(“password”, “NHkkP876rp”).load()

var newdf = dataframe_mysql.withColumnRenamed(“sum(trips)”,“Trips”)

newdf.write.mode(“overwrite”).parquet("/Parquet/Retail_db_Parquet/Uber/")
newdf.write.mode(“overwrite”).parquet("./Parquet/Retail_db_Parquet/Uber/")
newdf.write.mode(“overwrite”).parquet(".Parquet/Retail_db_Parquet/Uber/")
newdf.write.mode(“overwrite”).parquet(".Parquet/Retail_db_Parquet/Uber")
newdf.write.mode(“overwrite”).parquet(“home/gautamjain452920/Parquet/Retail_db_Parquet/Uber”)
newdf.write.mode(“overwrite”).parquet("/home/gautamjain452920/Parquet/Retail_db_Parquet/Uber")

Try a dataframe_mysql.show() here to ensure that you are able to access database. Also, I am assuming you are using the SBT because on the spark-shell we don’t need to create SQLContext.

try newdf.show() to test if you are able to access. Also, please use the spark-shell before creating the full-fledged scala program.

This one looks for “Parquet” in the root directory which you would not have access to.

This one expects that parent folder path “Parquet/Retail_db_Parquet” is already created in your HDFS but “Uber” must not be created. So, either just try:

newdf.write.mode(“overwrite”).parquet("Uber")

or first, create the parent folder structure like this:

hadoop fs -mkdir Parquet/Retail_db_Parquet

Please get comfortable with linux/unix file path nomenclature.