Merge Multiple Output Files

Is it possible to merge Outputs from multiple Reducers into a single file to better see the result in one place rather than looking at multi output files?.

The mapreduce generates multple files if

  • There is a reducer phase and there are multiple reducer tasks
  • There is a mapper phase only and there are multiple mapper tasks

Now, how to concatenate?

HDFS does not provide a way to concatenate files directly. You can either download multiple files and concatenate all files using unix cat command or use hadoop fs -cat command to print all files concatenated.

A bigger questions is why would we like to concatenate such files?

If you want to concatenate because you would like to feed it to another map-reduce job, then you can easily feed the folder or multiple files to another job.

Also, please note that though individual files are sorted the result of concatenating may not be sorted because of overlapping ranges. To fix this you can use a partitioner to create non-overlapping ranges and then concatenate.