How to run mrjob python programs on hadoop cluster?


I am very new to this and please tell me how to run mrjob python programs.

Program and data file is in HDFS.

This is my command to run:

python /sanjeev/ -r hadoop --hadoop-streaming- jar /hdp/apps/ /sanjeev/

The error is python: can’t open file ‘’: [Errno 2] No such file or directory.

Please help me out. This has a single class with Mapper and Reducer.


I have run successfully.


Could you put the final command that you used?



I am able to run it now. Thank you :slight_smile:



I am having the same issue, can you post how you resolved it?

Thank you,


General syntax of for running a Mapreduce program file :-

hadoop jar (path of the hadoop-streaming.jar file) -input (path of the input file) -output (path of the output path where we want to put our file) -mapper name of the mapper file -reducer name of the reducer file name -file name of the reducer file name.

Kindly try the below commands :-

  1. hadoop jar /usr/hdp/ -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper ‘sed “s/ /\n/g”’ -reducer “uniq -c”

  2. hadoop jar /usr/hdp/ -input /data/mr/wordcount/big.txt -output mapreduce-programming/character_frequency -mapper -file -reducer -file

May I know what specific error you are getting?

All the best!



I have tried the above and it has worked for me. I want to know if i can run Map Reduce steps using the MRJob Package in one python script?

Thank you,