Spark Scala error

mearupmukherjee · March 6, 2020, 5:53am

Hi I am getting an error while executing below spark code.This code is from Spark sample code Apache.

Could you please point me reason for this error.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.types._
// Create an RDD
val peopleRDD = spark.sparkContext.textFile("/user/mearupmukherjee17025/employees.txt")
// The schema is encoded in a string
val schemaString = “name age”
// Generate the schema based on the string of schema
val fields = schemaString.split(" “)
.map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
// Convert records of the RDD (people) to Rows
val rowRDD = peopleRDD
.map(_.split(”,"))
.map(attributes => Row(attributes(0), attributes(1).trim))
// Apply the schema to the RDD
val peopleDF = spark.createDataFrame(rowRDD, schema)
// Creates a temporary view using the DataFrame
peopleDF.createOrReplaceTempView(“people”)
// SQL can be run over a temporary view created using DataFrames
val results = spark.sql(“SELECT name FROM people”)

// The results of SQL queries are DataFrames and support all the normal RDD operations
// The columns of a row in the result can be accessed by field index or by field name
//results.map(attributes => "Name: " + attributes(0)).show()
results.show()
// Exiting paste mode, now interpreting.
20/03/06 05:39:03 ERROR Executor: Exception in task 0.0 in stage 19.0 (TID 59)
java.lang.ArrayIndexOutOfBoundsException: 1
at $line72.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$3.apply(:58)
at $line72.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$3.apply(:58)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)

sgiri · March 6, 2020, 6:15am

Please notice the main error:

java.lang.ArrayIndexOutOfBoundsException: 1

Most likely, the error is going to be in accessing an element at index from an array which is not existing:

.map(attributes => Row(attributes(0), attributes(1).trim))