How to run mrjob python programs on hadoop cluster?

I am very new to this and please tell me how to run mrjob python programs.

Program and data file is in HDFS.

This is my command to run:

python /sanjeev/Rating.py -r hadoop --hadoop-streaming- jar /hdp/apps/2.3.4.0-3485/mapreduce/hadoop-streaming.jar /sanjeev/u.data

The error is python: can’t open file ‘Rating.py’: [Errno 2] No such file or directory.

Please help me out. This Rating.py has a single class with Mapper and Reducer.

I have run successfully.

Could you put the final command that you used?

Hello,

I am able to run it now. Thank you :slight_smile:

1 Like

Hello,

I am having the same issue, can you post how you resolved it?

Thank you,
Priyanka

General syntax of for running a Mapreduce program file :-

hadoop jar (path of the hadoop-streaming.jar file) -input (path of the input file) -output (path of the output path where we want to put our file) -mapper name of the mapper file mapper.py -reducer name of the reducer file name -file name of the reducer file name.

Kindly try the below commands :-

  1. hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper ‘sed “s/ /\n/g”’ -reducer “uniq -c”

  2. hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper mapper.py -file mapper.py -reducer reducer.py -file reducer.py

May I know what specific error you are getting?

All the best!

Hi,

I have tried the above and it has worked for me. I want to know if i can run Map Reduce steps using the MRJob Package in one python script?

Thank you,
Priyanka