Hi,
I have wrote one mr code to word count specific columns in a csv file.When I am executing my code with two lines of sample input data ,MR job is working fine.But when I am executing same with actual file MR job is failing with
java.lang.OutOfMemoryError: GC overhead limit exceeded
.
Could you please let me know what is the issue. Same data set I am load using PIG.
Mapper class :
package test;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Logger;
public class Map extends Mapper<Object, Text, Text, IntWritable> {
private static final IntWritable one = new IntWritable(1);
static final Logger logger = Logger.getLogger(Driver.class.getName());
public void map(Object key, Text value, Context output) throws IOException, InterruptedException {
BasicConfigurator.configure();
String[] column = value.toString().split(",");
//logger.warn("fbi code is " + column[14].trim());
output.write(new Text(column[14].trim()), one);
}
}
Execution:
hadoop jar proj1.jar test.Driver /user/charanrajlv3971/Crimes.csv /user/charanrajl
v3971/julytest
Job id:1498404677707_2985
Log Type: stderr
Log Upload Time: Fri Jul 07 04:58:26 +0000 2017
Log Length: 349
Halting due to Out Of Memory Error...
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "communication threa
Log Type: syslog
Log Upload Time: Fri Jul 07 04:58:26 +0000 2017
Log Length: 5497
2017-07-07 04:56:34,492 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2017-07-07 04:56:34,744 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-07-07 04:56:34,744 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-07-07 04:56:34,753 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-07-07 04:56:34,753 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1498404677707_2985, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@6db9f5a4)
2017-07-07 04:56:34,847 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-07-07 04:56:35,130 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/charanrajlv3971/appcache/application_1498404677707_2985
2017-07-07 04:56:35,411 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-07-07 04:56:35,974 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-07-07 04:56:35,974 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2017-07-07 04:56:36,006 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-07-07 04:56:36,025 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 2284
2017-07-07 04:56:36,026 WARN [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 2304
2017-07-07 04:56:36,174 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ip-172-31-53-48.ec2.internal:8020/user/charanrajlv3971/Crimes.csv:0+67523910
2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 225181692(900726768)
2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 859
2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 630508736
2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 900726784
2017-07-07 04:56:36,669 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 225181692; length = 56295424
2017-07-07 04:56:36,679 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-07-07 04:56:44,038 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2017-07-07 04:56:45,632 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-07-07 04:56:46,758 FATAL [main] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[main,5,main] threw an Error. Shutting down now...
java.lang.OutOfMemoryError: GC overhead limit exceeded
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:133)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:220)
at java.io.Writer.write(Writer.java:157)
at org.apache.log4j.helpers.QuietWriter.write(QuietWriter.java:48)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:211)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:220)
2017-07-07 04:56:48,608 INFO [main] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException
2017-07-07 04:56:49,553 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
2017-07-07 04:56:51,574 ERROR [communication thread] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[communication thread,5,main] threw an Throwable, but we are shutting down, so ignoring this
java.lang.OutOfMemoryError: GC overhead limit exceeded
2017-07-07 04:56:53,094 WARN [Thread-5] org.apache.hadoop.util.ShutdownHookManager: ShutdownHook '' failed, java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2017-07-07 04:56:53,281 ERROR [Thread-5] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Thread-5,5,main] threw an Throwable, but we are shutting down, so ignoring this
java.lang.OutOfMemoryError: GC overhead limit exceeded