Hello,
According to Sandeep’s advice, it is working but not working with actual regex. Please find the code as below. (Here I want to take url for example. “/shuttle/missions/sts-68/news/sts-68-mcc-05.txt”
val pattern = “”"(\ /)(\S+)(\S*)"""
val line1 = “”“in24.inetnebr.com - - [01/Aug/1995:00:00:01 -0400] “GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt HTTP/1.0” 200 1839"”"
val pattern(ip,x,y)= line1
Please find the error.
scala.MatchError: in24.inetnebr.com - - [01/Aug/1995:00:00:01 -0400] “GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt HTTP/1.0” 200 1839 (of class ja
va.lang.String)
… 48 elided
I could mange to get the URL with below code but I think this not a good practice. Please help.
// below fuction gives only URL //
def extractURL(line:String):(String) = {
var arr = line.split(" ");
arr(6).trim
} // the above function is working fine … gives only URL
scala> var nurlkeyval = urlaccesslogs.map(line=>(extractURL(line),1))
n_url: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[14] at map at :36
var urlcounts = nurlkeyval.reduceByKey((a,b) => (a+b))
var urlcountsOrdered = urlcounts.sortBy(f => f._2, false);
urlcountsOrdered.take(10)