Writing to Hive


I am writing a simple program in spark to write to hive from jupyter notebook. But I am not able to see the records inserted. Can someone pls let me know what is wrong?


Actually, this would try to write in /home/alokdeosingh1995 folder in HDFS and in HDFS /home does not exist.

I would suggest that in .config, you should simply pass a string like this:

.config("spark.sql.warehouse.dir", "spark-warehouse")


Thanks Sandeep for the quick reply! I tried it and it worked. However, when I add some kafka jars to read kafka in my spark program, it does not find the db again.
%AddJar http://central.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10-assembly_2.11/2.4.3/spark-streaming-kafka-0-10-assembly_2.11-2.4.3.jar
%AddJar http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.0.2/spark-sql-kafka-0-10_2.11-2.0.2.jar

Code that works and writes to hive:

Code that does not works and gives error when writing to hive.

Are the jars being added to different spark handle which jupyter starts?



the new spark was not setting the hive warehouse directory correctly. So, I have fixed it. You don’t need to set the lcoation. I tried the following it worked well on spark-shell:

def q(s:String) = spark.sql(s).show()
q(“create database sg”)
q(“show tables”)
q(“create table x(a int, b varchar(10))”)
q(“show tables”)
q(“insert into x values(1, ‘sandeep’)”)
q(“insert into x values(2, ‘giri’)”)
q(“select * from x”)


Thanks Sandeep! It works on web console but not on jupyter.


Also noticed that this does not work in cluster mode.
spark-submit --class com.alok.projects.entry --master yarn --deploy-mode cluster target/scala-2.11/kafkaconsumer_2.11-0.1.jar

But works in local mode
spark-submit --class com.alok.projects.entry --master local[*] kafkaconsumer_2.11-0.1.jar

There is something wrong that I am doing, can you please point it to me. Thanks!


Thanks Sandeep! I got the answer. The reply on this SO post https://stackoverflow.com/questions/34034488/hive-tables-not-found-when-running-in-yarn-cluster-mode


spark-submit --class com.alok.projects.entry --master yarn --deploy-mode cluster --files /usr/hdp/current/spark-client/conf/hive-site.xml kafka

But this still doesnot works on jupyter. How to set files parameter there?