I am trying out spark codes. I executed first on my local machine then on the cluster. Local machine did not threw error but the remote cluster did
These are the commands:
ON REMOTE FIRST STARTED THE DESIRED spark-shell
/usr/spark2.2.1/bin/spark-shell
Just to be sure I have placed the same file on hdfs as well as Local
WHEN ON HDFS :
val flightdata = spark.read.option(“inferSchema”,“true”).option(“header”,“true”).csv("/user/uutkarshsingh7351/2015-summary.csv")
WHEN ON LOCAL of remote:
val flightdata
=spark.read.option(“inferSchema”,“true”).option(“header”,“true”).csv("/home/uutkarshsingh7351/2015-summary.csv")
Error that I am getting:
java.lang.IllegalArgumentException: Error while instantiating ‘org.apache.spark.sql.hive.HiveSessionStateBuilder’:
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:689)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:645)
MY QUESTION
Are this commands correct ?
Do I need to keep the file on the remote local or on the HDFS ?
Just for information I am following the book
Spark :The Definitive guide.