what is the command to run MRjob on hadoop?
You can see it here: https://cloudxlab.com/assessment/displayslide/279/run-mapreduce-jobs-using-hadoop-streaming?course_id=1&playlist_id=8&audit=true
I am talking about MRjob specifically when there is only one file that has both mapper and reducer.
https://mrjob.readthedocs.io/en/latest/guides/configs-hadoopy-runners.html
I think you will have to create your virtual environment and install MR Job in it.
ok great thanks. Also how do we handle the missing streaming jar error?
The streaming jar should be located here: /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar
As mentioned in the error, use the command line argument to specify the streaming jar.