Not able to retrieve data

Hello,

I am trying to store the data in RDD and display on the screen. Getting error as below.

command I have used as below

val logFile = sc.textFile("/data/spark/project/NASA_access_log_Aug95.gz")

logFile.take(5)

Error…

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/data/spark/project/NASA_access_log_Aug95.gz
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)

I have checked the path in HUE… it is there.

Please help.

If you do not define two environment variables, it looks for the file in the local directory:

export YARN_CONF_DIR=/etc/hadoop/conf/
export HADOOP_CONF_DIR=/etc/hadoop/conf/
/usr/spark2.0.2/bin/spark-shell
val logFile = sc.textFile("/data/spark/project/NASA_access_log_Aug95.gz")
logFile.take(10)

Otherwise, you could specify the location of file as absolute:

/usr/spark2.0.2/bin/spark-shell
val logFile = sc.textFile("hdfs://ip-172-31-53-48/data/spark/project/NASA_access_log_Aug95.gz")
logFile.take(10)

Here ip-172-31-53-48 is internal ip address of namenode.

Hello

Yes I realised that… that was some mistake to give export command

but now it is fixed…

thanks…

1 Like