Monday, 31 December 2012

Adding GC logs to hadoop child processes and analyzing the GC logs

Add the following to the mapred-site.xml file: Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc.

Additional options: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC


Analyzing GC logs:

Meaning of the [GC [PSYoungGen: 230400K->19135K(268800K)] line is:

  • Around 256MB (268800K) is the Young Generation Size, 
  • Before Garbage Collection in young generation the heap utilization in Young Generation area was around 255MB (230400K) and 
  • After garbage collection it reduced up to 18MB (19135K).


1 comment:

  1. This is one of the most incredible blogs Ive read in a very long time. The amount of information in here is stunning, like you practically wrote the book on the subject. Your blog is great for anyone who wants to understand this subject more. Great stuff; please keep it up!
    Hadoop Training in hyderabad