Error while running Spark Scala code

Hi,

I am running basic Scala code,

val re = sc.textFile(“path”)
val header = re.first
val re1 = re.filter(!=header)
val a = re1.map(x=>x.split("\|"))
case class realestate (PropertyID:Int, Location:String, Price:Int, Bedrooms:Int, Bathrooms:Int, Size:Int, PriceperSqft:Double, Status:String)
val b = a.map(i => realestate(i(0).toInt,i(1),i(2).toInt,i(3).toInt,i(4).toInt,i(5).toInt,i(6).toDouble,i(7)))
val c = b.filter(
.Location.contains(“Thomas County”))
c.collect

The real problem comes after I enter c.collect, it gives me the following error,
ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
java.lang.NumberFormatException: For input string: “”
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:34)
at $line26.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:34)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)…goes on

Can you please check and let me know what am I missing here.

Regards,
Abinesh.

Some of the records are not proper. Probably this line is throwing error. Before trying to cast, please validate the records and using filter remove the invalid records.

Hi Giri,

Thanks for the response. I will try that, but if i try the same code in Cloudera VM, it is working fine. The data is loaded and I can view them.

Regards,
Abinesh.

May be the data is different.