How to access the input file from hdfs to intellij idea


#1

Hi

i have written a word count code in intellij and below is the code

package SparkApplications

import org.apache.spark.{SparkConf, SparkContext}

object WordCount {

def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(“local”).setAppName(“WordCount”)
val sc = new SparkContext(conf)
val inputPath = args(0)
val outputPath = args(1)
// create rdd
val rdd1 = sc.textFile(inputPath)
val rdd2 = rdd1.flatMap(.split(" "))
val rdd3 = rdd2.map(word => (word, 1))
val rdd4 = rdd3.reduceByKey(
+ _)
rdd4.saveAsTextFile(outputPath)
}
}

and i have given the input and output paths in edit configuration as input -"/home/shravanthirushi3238/big.txt" and output - “/home/shravanthirushi3238/big2.txt” is the path which i have given in edit configuration since i have big.txt file in my hdfs i have given that path. When i am accessing that file from linux i am able to access it. But i am getting error while executing the above code in intellij by giving the same input path error is “input path doesn’t exist” . How we should use the input file which is in hdfs to the intellij as input file.

could you please explain. Thank you.


#2

Hi @shravanthi_rushi,

/home/shravanthirushi3238 is your home directory on Linux machine. Please use /user/shravanthirushi3238 which is your home directory on HDFS.

Hope this helps.

Thanks