How to run a python program in the CloudXLab?


I am asking for a guidance to run a python program on the cloudxlab following this pattern:

  1. Assuming that my python code which can be a custom map/reduce process, lets call it

  2. Assuming that my input folder is in the sub-directory /myinputfolder

  3. Assuming that my input file is in the sub-directory /myinoutfolder/myinputdata.txt

Can someone provide me with the right command to run this python program using hadoop streaming-jar file?

The following is an example of a command that I used and the system come back with the file not found error!

python /user/drarmankanooni3849/ -r hadoop --hadoop-streaming- jar /hdp/apps/ /user/drarmankanooni3849/movielens/

Thank you,

This should help you:

After that go thru this one:

Hi Sandeep,

I appreciate your video link to run a map reduce job using a generic library.

In my Python program as shown below, I am using the MRJob library from mrjob.job in Python which is different from the ones that you mentioned. Also, this library allows me to create different programs beside a word count. :slightly_smiling_face:

I am looking for a step by step process to do this. Is the CloudXLab has already MRJob library enabled?

Please HELP


from mrjob.job import MRJob
from mrjob.step import MRStep

class RatingsBreakdown(MRJob):
def steps(self):
return [

def mapper_get_ratings(self, _, line):
    (userID, movieID, rating, timestamp) = line.split('\t')
    yield rating, 1

def reducer_count_ratings(self, key, values):
    yield key, sum(values)

if name == ‘main’: