Error in Mapreduce Code - JVM OutOfMemoryError

Hi,

I have wrote one mr code to word count specific columns in a csv file.When I am executing my code with two lines of sample input data ,MR job is working fine.But when I am executing same with actual file MR job is failing with
java.lang.OutOfMemoryError: GC overhead limit exceeded.

Could you please let me know what is the issue. Same data set I am load using PIG.

Mapper class :
package test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Logger;


public class Map extends Mapper<Object, Text, Text, IntWritable> {

	private static final IntWritable one = new IntWritable(1);
	static final Logger logger = Logger.getLogger(Driver.class.getName());

	public void map(Object key, Text value, Context output) throws IOException, InterruptedException {
		BasicConfigurator.configure();
		String[] column = value.toString().split(",");
		//logger.warn("fbi code is " + column[14].trim());
		output.write(new Text(column[14].trim()), one);
	}
}

Execution:

hadoop jar proj1.jar test.Driver /user/charanrajlv3971/Crimes.csv /user/charanrajl
v3971/julytest  

Job id:1498404677707_2985

Log Type: stderr

  Log Upload Time: Fri Jul 07 04:58:26 +0000 2017
  
  Log Length: 349
  Halting due to Out Of Memory Error...
  
  Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
  
  Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "communication threa

Log Type: syslog

  Log Upload Time: Fri Jul 07 04:58:26 +0000 2017
  
  Log Length: 5497
  2017-07-07 04:56:34,492 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
  2017-07-07 04:56:34,744 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
  2017-07-07 04:56:34,744 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
  2017-07-07 04:56:34,753 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
  2017-07-07 04:56:34,753 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1498404677707_2985, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6db9f5a4)
  2017-07-07 04:56:34,847 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
  2017-07-07 04:56:35,130 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/charanrajlv3971/appcache/application_1498404677707_2985
  2017-07-07 04:56:35,411 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
  2017-07-07 04:56:35,974 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
  2017-07-07 04:56:35,974 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
  2017-07-07 04:56:36,006 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
  2017-07-07 04:56:36,025 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 2284
  2017-07-07 04:56:36,026 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 2304
  2017-07-07 04:56:36,174 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ip-172-31-53-48.ec2.internal:8020/user/charanrajlv3971/Crimes.csv:0+67523910
  2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 225181692(900726768)
  2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 859
  2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 630508736
  2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 900726784
  2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 225181692; length = 56295424
  2017-07-07 04:56:36,679 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  2017-07-07 04:56:44,038 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
  2017-07-07 04:56:45,632 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
  2017-07-07 04:56:46,758 FATAL [main] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[main,5,main] threw an Error.  Shutting down now...
  java.lang.OutOfMemoryError: GC overhead limit exceeded
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:133)
  at java.io.OutputStreamWriter.write(OutputStreamWriter.java:220)
  at java.io.Writer.write(Writer.java:157)
  at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
  at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
  at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
  at org.apache.log4j.Category.callAppenders(Category.java:206)
  at org.apache.log4j.Category.forcedLog(Category.java:391)
  at org.apache.log4j.Category.log(Category.java:856)
  at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176)
  at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:211)
  at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594)
  at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
  at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:220)
  2017-07-07 04:56:48,608 INFO [main] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException
  2017-07-07 04:56:49,553 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
  
  2017-07-07 04:56:51,574 ERROR [communication thread] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[communication thread,5,main] threw an Throwable, but we are shutting down, so ignoring this
  java.lang.OutOfMemoryError: GC overhead limit exceeded
  2017-07-07 04:56:53,094 WARN [Thread-5] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook '' failed, java.lang.OutOfMemoryError: Java heap space
  java.lang.OutOfMemoryError: Java heap space
  2017-07-07 04:56:53,281 ERROR [Thread-5] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Thread-5,5,main] threw an Throwable, but we are shutting down, so ignoring this
  java.lang.OutOfMemoryError: GC overhead limit exceeded

Hi @charanrajlv3971,

At times when the load is high on the servers you may get java.lang.OutOfMemoryError.

Just curious if you were able to run the code?