Error while running Spark Scala code

Abinesh_V_X · September 3, 2020, 3:56pm

Hi,

I am running basic Scala code,

val re = sc.textFile(“path”)
val header = re.first
val re1 = re.filter(!=header)
val a = re1.map(x=>x.split("\|"))
case class realestate (PropertyID:Int, Location:String, Price:Int, Bedrooms:Int, Bathrooms:Int, Size:Int, PriceperSqft:Double, Status:String)
val b = a.map(i => realestate(i(0).toInt,i(1),i(2).toInt,i(3).toInt,i(4).toInt,i(5).toInt,i(6).toDouble,i(7)))
val c = b.filter(.Location.contains(“Thomas County”))
c.collect

The real problem comes after I enter c.collect, it gives me the following error,
ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
java.lang.NumberFormatException: For input string: “”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:34)
at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:34)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)…goes on

Can you please check and let me know what am I missing here.

Regards,
Abinesh.

sgiri · September 4, 2020, 3:12pm

Some of the records are not proper. Probably this line is throwing error. Before trying to cast, please validate the records and using filter remove the invalid records.

Abinesh_V_X · September 4, 2020, 4:52am

Hi Giri,

Thanks for the response. I will try that, but if i try the same code in Cloudera VM, it is working fine. The data is loaded and I can view them.

Regards,
Abinesh.

sgiri · September 4, 2020, 11:29am

May be the data is different.