Help required in MapReduce Java program execution

Hi there,

As a practice for MapReduce programming, I tried implementing Max Temperature problem as discussed in the class. The problem statement is reduce the data to max temperature for a state for a particular day. I have successfully implemented the solution.

The problem is that I couldn’t able to run my solution with the following command

hadoop jar build/jar/hdpexamples.jar com.cloudxlab.**<myproject>**.StubDriver

Well, I was able to run the project by changing the build.xml as follows:

The existing entry:

<target name="jar" depends="compile">
    <mkdir dir="build/jar"/>
    <jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
        <manifest>
            <attribute name="Main-Class" value="com.cloudxlab.**wordcount**.StubDriver"/>
        </manifest>
    </jar>
</target>

To:

<target name="jar" depends="compile">
    <mkdir dir="build/jar"/>
    <jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
        <manifest>
            <attribute name="Main-Class" value="com.cloudxlab.**maxtemp**.StubDriver"/>
        </manifest>
    </jar>
</target>

The project ran successfully and produced the desired results. The only issue is that it can run the ‘maxtemp’ project but to run ‘wordcount’ again, I have to change the build.xml again and build the whole thing again in order to run.

What is the solution so that the both program can run without the need of re-compilation.

The code for maxtemp can be found here.

P.s.

Thanks
Noor

Noor

Do need to run both program one after another ?. In that case you can just copy the jar element and paste it below the current one as shown below


<target name="jar" depends="compile">
    <mkdir dir="build/jar"/>
    <jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
        <manifest>
            <attribute name="Main-Class" value="com.cloudxlab.**maxtemp**.StubDriver"/>
        </manifest>
    </jar>
   <jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
        <manifest>
            <attribute name="Main-Class" value="com.cloudxlab.**wordcount**.StubDriver"/>
        </manifest>
    </jar>
</target>

Hi Rajan,

Thanks for your reply.

I have already tried that. However, I am not sure after doing this also, it ran the first program only even if I passed the second program in run statement.

hadoop jar build/jar/hdpexamples.jar com.cloudxlab..StubDriver

Here’s my build.xml

<target name="jar" depends="compile">
<mkdir dir="build/jar"/>
<jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
    <manifest>
        <attribute name="Main-Class" value="com.cloudxlab.maxtemp.StubDriver"/>
    </manifest>
</jar>

<jar destfile="build/jar/hdpexamples.jar" basedir="build/classes">
    <manifest>
        <attribute name="Main-Class" value="com.cloudxlab.wordcount.StubDriver"/>
    </manifest>
</jar>

and even through I called the maxtemp program, it actually run the wordcount program:

[noorrocks1796@ip-172-31-38-183 java]$ hadoop jar build/jar/hdpexamples.jar com.cloudxlab.maxtemp.StubDriver
WARNING: Use "yarn jar" to launch YARN applications.
18/03/06 05:25:22 INFO impl.TimelineClientImpl: Timeline service address: http://ip-172-31-13-154.ec2.internal:8188/ws/v1/timeline/
18/03/06 05:25:22 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-53-48.ec2.internal/172.31.53.48:8050
18/03/06 05:25:23 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute
 your application with ToolRunner to remedy this.
18/03/06 05:25:23 INFO input.FileInputFormat: Total input paths to process : 1
18/03/06 05:25:23 INFO mapreduce.JobSubmitter: number of splits:1

Hi Noor,

I was going thru the code. And I noticed that in your mapper, you are trying to create the hashmap. Why?

Mapper could just give out city-temp (not max temp) and reducer could just compute the max-temp.

Also, when I checked out you repository and create a fresh package, I think I was able to run it because I got error saying temps.txt not found:

hadoop jar build/jar/hdpexamples.jar com.cloudxlab.maxtemp.StubDriv
er
WARNING: Use “yarn jar” to launch YARN applications.
18/03/11 06:30:31 INFO impl.TimelineClientImpl: Timeline service address: http://ip-172-31-13-154.ec2.intern
al:8188/ws/v1/timeline/
18/03/11 06:30:31 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-53-48.ec2.internal/172.31.
53.48:8050
18/03/11 06:30:32 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Impl
ement the Tool interface and execute your application with ToolRunner to remedy this.
18/03/11 06:30:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/sandeepgiri9034/.staging/j
ob_1517296050843_6387
Exception in thread “main” org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not
exist: hdfs://ip-172-31-53-48.ec2.internal:8020/user/sandeepgiri9034/temps.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.ja
va:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmi

Hi Sandeep,

Thanks for the answer. The reason for creating hashmap here to get the max temp at block level and pass it to reduce and further take max of max and store in file in reduce level. In my opinion, it will distribute the work and have less process at reduce level.

This you can do by using the reducer as combiner.

Mapper1 -> [(k1,1), (k1, 2) (k2, 3)] -> combiner(same code as reducer) -> [(k1, 2) (k2, 3)] --> to reducer
Mapper2 -> [(k1,1), (k1, 4) (k2, 4)] -> combiner(same code as reducer) -> [(k1, 4) (k2, 4)] --> to reducer

reducer --> [(k1, 4) (k2, 4)]

Thanks Sandeep for the reply.

Yes it’s running my program but not wordcount. The problem is that I can only run one of these programs by changing the build.xml.

my original question is how would I able to run both programs? what would be the contents of build.xml which will allow that?

Thanks… this is great. so we should use combiner for block level aggregations.

Yes, if you want the aggregation on the same node.

Cool… This is awesome