Archive for June, 2007

hprof for diagnosing memory leaks

June 29, 2007

The last few days I’ve been trying to diagnose a possible memory leak in one of our java web services and have been using hprof the built in java profiler that comes with J2SE. hprof is easy to use just enter

java -Xrunhprof:help

to see a list of options for usage. I wanted the heap profiling option so used

-Xrunhprof:heap=sites,depth=10

in the command to launch the web server.

heap=sites will break down memory usage according to the amount of memory allocated to particular objects and will also generate stack traces showing the methods which allocated this memory. The depth option sets the depth of the stack trace; I’ve set it to 10 but the default is 4.

hprof generates an output file, java.hprof.txt, on program exit which starts with the stack traces and finishes with a breakdown of memory usage. Here is a snippet of the memory usage part of the file showing the objects using the most amount of memory:

SITES BEGIN (ordered by live bytes) Tue Jun 26 16:04:27 2007
  	percent          live          alloc'ed  stack class
rank   self  accum     bytes objs     bytes  objs trace name
   1 30.27% 30.27% 123015224 1260358 123015224 1260358 331325 char[]
   2 11.11% 41.38%  45130056 667796  45130056 667796 331138 char[]
   3  7.97% 49.35%  32396784    1  32396784     1 331303 java.lang.String[]
   4  7.97% 57.32%  32396784    1  32396784     1 331302 int[]
   5  7.44% 64.77%  30248592 1260358  30248592 1260358 331324 java.lang.String
   6  5.98% 70.74%  24297624    3  24297624     3 331443 byte[]
   7  5.26% 76.00%  21369472 667796  21369472 667796 331140 org.apache.lucene.index.TermInfo
   8  3.94% 79.95%  16027104 667796  16027104 667796 331137 java.lang.String
   9  2.63% 82.58%  10684736 667796  10684736 667796 331139 org.apache.lucene.index.Term
  10  1.31% 83.89%   5342384    1   5342384     1 331134 long[]

There seems to be a fair amount of information regarding hprof on the web, but I haven’t managed to find a definitive explanation of exactly what all these columns mean, however, I have a fair idea by now so here goes:

Sites – are particular stack traces.
rank – ranking is in order of amount of memory taken up by particular objects in a stack trace
self – this is the percentage of space allocated to particular objects
accum – not sure of this one, but guessing it could be the percentage of memory ever accumulated by these objects before garbage collection
live bytes – number of live bytes taken up by currently live objects
live objs – number of currently live objects
alloc’ed bytesI think this is the number of bytes allocated so far for particular objects
alloc’ed objs – likewise I think the number of objects of this type so far allocated
stack trace – stack trace number
class name – class of object

As can be seen most memory is used by the char[] (live bytes = 123015224/objs = 1260358). Live bytes/objs will usually be less than alloc'ed bytes/objs due to garbage collection taking place, but as the web server was only running for a short time before this profile was taken it is likely that garbage collection had not happened by the time I stopped the server.

Where live bytes/objs = alloc'ed bytes/objs this could possibly signify a memory leak. One of the sources I looked at stated that low level objects such as char[] tend to float to the top and advised looking further down the ranking for heads of leaking data structures. (See here).

Here is stack trace 331325 for our highest ranking char[] objects:

TRACE 331325:

java.lang.String.<init>(<Unknown Source>:Unknown line) org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:104)
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:155)
org.apache.lucene.search.FieldCacheImpl$6.createValue(FieldCacheImpl.java:282)
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:260)
org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:371)
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:334)
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQueue.java:338)

which were allocated by the section of the code that performs a lucene search with a request to sort the result set.

For the moment it looks as though our “memory leak” issue is due to the sorting functionality in lucene using up a lot of resources rather than a memory leak as such. We are sorting strings which the lucene docs state are the most expensive types to sort in terms of resources because each unique term is cached for each document. This could be a problem for us and we may have to rethink how we do our sorting, if so that will be another topic.

hprof links I found useful:

http://java.sun.com/developer/technicalArticles/Programming/HPROF.html

http://www.skywayradio.com/tech/WAS51/appserver_hangs.php
http://www.skywayradio.com/tech/WAS51/HProf.php
http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?
topic=/com.ibm.java.doc.diagnostics.142/html/id1590.html

http://www.javalobby.org/java/forums/t19612.html
http://www.javaworld.com/javaworld/jw-12-2001/jw-1207-hprof.html