Writing to Hive


#1

I am writing a simple program in spark to write to hive from jupyter notebook. But I am not able to see the records inserted. Can someone pls let me know what is wrong?


#2

Actually, this would try to write in /home/alokdeosingh1995 folder in HDFS and in HDFS /home does not exist.

I would suggest that in .config, you should simply pass a string like this:

.config("spark.sql.warehouse.dir", "spark-warehouse")


#3

Thanks Sandeep for the quick reply! I tried it and it worked. However, when I add some kafka jars to read kafka in my spark program, it does not find the db again.
%AddJar http://central.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10-assembly_2.11/2.4.3/spark-streaming-kafka-0-10-assembly_2.11-2.4.3.jar
%AddJar http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.0.2/spark-sql-kafka-0-10_2.11-2.0.2.jar

Code that works and writes to hive:

Code that does not works and gives error when writing to hive.

Are the jars being added to different spark handle which jupyter starts?


#4

Hi,

the new spark was not setting the hive warehouse directory correctly. So, I have fixed it. You don’t need to set the lcoation. I tried the following it worked well on spark-shell:

def q(s:String) = spark.sql(s).show()
q(“create database sg”)
q(“show tables”)
q(“create table x(a int, b varchar(10))”)
q(“show tables”)
q(“insert into x values(1, ‘sandeep’)”)
q(“insert into x values(2, ‘giri’)”)
q(“select * from x”)


#5

Thanks Sandeep! It works on web console but not on jupyter.


#6

Also noticed that this does not work in cluster mode.
spark-submit --class com.alok.projects.entry --master yarn --deploy-mode cluster target/scala-2.11/kafkaconsumer_2.11-0.1.jar

But works in local mode
spark-submit --class com.alok.projects.entry --master local[*] kafkaconsumer_2.11-0.1.jar

There is something wrong that I am doing, can you please point it to me. Thanks!


#7

Thanks Sandeep! I got the answer. The reply on this SO post https://stackoverflow.com/questions/34034488/hive-tables-not-found-when-running-in-yarn-cluster-mode

worked!

spark-submit --class com.alok.projects.entry --master yarn --deploy-mode cluster --files /usr/hdp/current/spark-client/conf/hive-site.xml kafka
consumer_2.11-0.1.jar

But this still doesnot works on jupyter. How to set files parameter there?