How to run mrjob python programs on hadoop cluster?

Sanjeev5 · June 30, 2017, 12:58pm

I am very new to this and please tell me how to run mrjob python programs.

Program and data file is in HDFS.

This is my command to run:

python /sanjeev/Rating.py -r hadoop --hadoop-streaming- jar /hdp/apps/2.3.4.0-3485/mapreduce/hadoop-streaming.jar /sanjeev/u.data

The error is python: can’t open file ‘Rating.py’: [Errno 2] No such file or directory.

Please help me out. This Rating.py has a single class with Mapper and Reducer.

Sanjeev5 · July 2, 2017, 3:19am

I have run successfully.

sgiri · July 2, 2017, 6:37pm

Could you put the final command that you used?

Sanjeev5 · July 3, 2017, 5:46am

Hello,

I am able to run it now. Thank you

tirunagiripriyanka98 · March 20, 2020, 4:12am

Hello,

I am having the same issue, can you post how you resolved it?

Thank you,
Priyanka

satyajit_das · March 20, 2020, 5:56pm

General syntax of for running a Mapreduce program file :-

hadoop jar (path of the hadoop-streaming.jar file) -input (path of the input file) -output (path of the output path where we want to put our file) -mapper name of the mapper file mapper.py -reducer name of the reducer file name -file name of the reducer file name.

Kindly try the below commands :-

hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper ‘sed “s/ /\n/g”’ -reducer “uniq -c”
hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py

May I know what specific error you are getting?

All the best!

tirunagiripriyanka98 · March 20, 2020, 11:44pm

Hi,

I have tried the above and it has worked for me. I want to know if i can run Map Reduce steps using the MRJob Package in one python script?

Thank you,
Priyanka