Spark - Project - Apache log parsing - Find HTTP codes


#1

Issue while executing the below codes to get unique HTTP codes ,please help to resolve

var accesslogs =sc.textFile("/data/spark/project/NASA_access_log_Aug95.gz");
//function to retrieve httpcode
def unqHttp(line:String):(String) = {
var arr = line.split(" ");
arr(8).trim
}

var httpcodekeyval = accesslogs.map(line=>(unqHttp(line),1))
var httpcounts = httpcodekeyval.reduceByKey((a,b) => (a+b))
httpcounts.take(10)

19/11/07 09:21:13 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException
19/11/07 09:21:13 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException
19/11/07 09:21:13 ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.
0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)


#2

@Bhabani_Sankar_Mishr,

Its java.lang.ArrayIndexOutOfBoundsException exception. Can you please fix the code?