Spark-shell related problem , HiveSessionStateBuilder

This is my code

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField,StructType,StringType,LongType}
val myManualSchema = new StructType(Array(
new StructField(“some”,StringType,true),
new StructField(“col”,StringType,true),
new StructField(“names”,LongType,false)))

val myRows = Seq(Row(“Hello”,null,1L))
val myRDD = spark.sparkContext.parallelize(myRows)
val myDf = spark.createDataFrame(myRDD,myManualSchema)

When I type the highlighted line the following error is thrown

spark.createDataFrame is throwing error spark.createDataFrame(myRDD,myManualSchema) java.lang.IllegalArgumentException: Error while instantiating ‘org.apache.spark.sql.hive.HiveSessionStateBuilder’:
** at** org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:66)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:587)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:344)
… 48 elided

ON LOCAL IT IS WORKING CORRECTLY.

Any one faced this problem ?

Hi @utkarsh_rathor

I tried in older version of spark I,e.1.5.2 it works fine but for the latest version getting the error may be jars need to be add for them.

In local which version your using for spark

The book that I am referring “Spark a Definitive guide” asks to execute in a particular version.

Try to execute older version once in the lab.

Not sure how it did not gave error to you
in version 1.5.2 of spark-shell
I am getting the error that spark variable is not present.
may be that has been introduced in the later version

scala> val myRows = Seq(Row(“Hello”,null,1L))
myRows: Seq[org.apache.spark.sql.Row] = List([Hello,null,1])

scala> val myRDD = spark.sparkContext.parallelize(myRows)
:26: error: not found: value spark
val myRDD = spark.sparkContext.parallelize(myRows)
^

scala> val myDf = spark.createDataFrame(myRDD,myManualSchema)
:26: error: not found: value spark
val myDf = spark.createDataFrame(myRDD,myManualSchema)
^

I will ping you how we shld execute give me 1hour one more import you need to add

1 Like

code snippet in the book
// in Scala
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField, StructType, StringType, LongType}

val myManualSchema = new StructType(Array(
new StructField(“some”, StringType, true),
new StructField(“col”, StringType, true),
new StructField(“names”, LongType, false)))
val myRows = Seq(Row(“Hello”, null, 1L))
val myRDD = spark.sparkContext.parallelize(myRows) #this needs to be modified to sc
val myDf = spark.createDataFrame(myRDD, myManualSchema) #give the sqlContext for df
myDf.show()

modified snippet

// in Scala

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructField, StructType, StringType, LongType}

val myManualSchema = new StructType(Array(
new StructField(“some”, StringType, true),
new StructField(“col”, StringType, true),
new StructField(“names”, LongType, false)))
val myRows = Seq(Row(“Hello”, null, 1L))
val myRDD = sc.parallelize(myRows)
val myDf = sqlContext.createDataFrame(myRDD, myManualSchema)
myDf.show()

Try this once and let me know if you still face the issue. :slight_smile: