Spark-shell related problem , HiveSessionStateBuilder

utkarsh_rathor · May 28, 2018, 8:07pm

This is my code

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField,StructType,StringType,LongType}
val myManualSchema = new StructType(Array(
new StructField(“some”,StringType,true),
new StructField(“col”,StringType,true),
new StructField(“names”,LongType,false)))

val myRows = Seq(Row(“Hello”,null,1L))
val myRDD = spark.sparkContext.parallelize(myRows)
val myDf = spark.createDataFrame(myRDD,myManualSchema)

When I type the highlighted line the following error is thrown

spark.createDataFrame is throwing error spark.createDataFrame(myRDD,myManualSchema) java.lang.IllegalArgumentException: Error while instantiating ‘org.apache.spark.sql.hive.HiveSessionStateBuilder’:
** at** org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:66)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:587)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:344)
… 48 elided

ON LOCAL IT IS WORKING CORRECTLY.

Any one faced this problem ?

Shanmukh · May 29, 2018, 3:37pm

Hi @utkarsh_rathor

I tried in older version of spark I,e.1.5.2 it works fine but for the latest version getting the error may be jars need to be add for them.

In local which version your using for spark

utkarsh_rathor · May 29, 2018, 3:39pm

The book that I am referring “Spark a Definitive guide” asks to execute in a particular version.

Shanmukh · May 29, 2018, 3:41pm

Try to execute older version once in the lab.

utkarsh_rathor · May 29, 2018, 3:47pm

Not sure how it did not gave error to you
in version 1.5.2 of spark-shell
I am getting the error that spark variable is not present.
may be that has been introduced in the later version

scala> val myRows = Seq(Row(“Hello”,null,1L))
myRows: Seq[org.apache.spark.sql.Row] = List([Hello,null,1])

scala> val myRDD = spark.sparkContext.parallelize(myRows)
:26: error: not found: value spark
val myRDD = spark.sparkContext.parallelize(myRows)
^

scala> val myDf = spark.createDataFrame(myRDD,myManualSchema)
:26: error: not found: value spark
val myDf = spark.createDataFrame(myRDD,myManualSchema)
^

Shanmukh · May 29, 2018, 3:48pm

I will ping you how we shld execute give me 1hour one more import you need to add

Shanmukh · May 29, 2018, 4:58pm

code snippet in the book
// in Scala
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField, StructType, StringType, LongType}

val myManualSchema = new StructType(Array(
new StructField(“some”, StringType, true),
new StructField(“col”, StringType, true),
new StructField(“names”, LongType, false)))
val myRows = Seq(Row(“Hello”, null, 1L))
val myRDD = spark.sparkContext.parallelize(myRows) #this needs to be modified to sc
val myDf = spark.createDataFrame(myRDD, myManualSchema) #give the sqlContext for df
myDf.show()

modified snippet

// in Scala

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField, StructType, StringType, LongType}

val myManualSchema = new StructType(Array(
new StructField(“some”, StringType, true),
new StructField(“col”, StringType, true),
new StructField(“names”, LongType, false)))
val myRows = Seq(Row(“Hello”, null, 1L))
val myRDD = sc.parallelize(myRows)
val myDf = sqlContext.createDataFrame(myRDD, myManualSchema)
myDf.show()

Try this once and let me know if you still face the issue.