The last few days I’ve been trying to diagnose a possible memory leak in one of our java web services and have been using hprof the built in java profiler that comes with J2SE. hprof is easy to use just enter
java -Xrunhprof:help
to see a list of options for usage. I wanted the heap profiling option so used
-Xrunhprof:heap=sites,depth=10
in the command to launch the web server.
heap=sites
will break down memory usage according to the amount of memory allocated to particular objects and will also generate stack traces showing the methods which allocated this memory. The depth
option sets the depth of the stack trace; I’ve set it to 10 but the default is 4.
hprof generates an output file, java.hprof.txt, on program exit which starts with the stack traces and finishes with a breakdown of memory usage. Here is a snippet of the memory usage part of the file showing the objects using the most amount of memory:
SITES BEGIN (ordered by live bytes) Tue Jun 26 16:04:27 2007 percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name 1 30.27% 30.27% 123015224 1260358 123015224 1260358 331325 char[] 2 11.11% 41.38% 45130056 667796 45130056 667796 331138 char[] 3 7.97% 49.35% 32396784 1 32396784 1 331303 java.lang.String[] 4 7.97% 57.32% 32396784 1 32396784 1 331302 int[] 5 7.44% 64.77% 30248592 1260358 30248592 1260358 331324 java.lang.String 6 5.98% 70.74% 24297624 3 24297624 3 331443 byte[] 7 5.26% 76.00% 21369472 667796 21369472 667796 331140 org.apache.lucene.index.TermInfo 8 3.94% 79.95% 16027104 667796 16027104 667796 331137 java.lang.String 9 2.63% 82.58% 10684736 667796 10684736 667796 331139 org.apache.lucene.index.Term 10 1.31% 83.89% 5342384 1 5342384 1 331134 long[]
There seems to be a fair amount of information regarding hprof on the web, but I haven’t managed to find a definitive explanation of exactly what all these columns mean, however, I have a fair idea by now so here goes:
Sites – are particular stack traces.
rank – ranking is in order of amount of memory taken up by particular objects in a stack trace
self – this is the percentage of space allocated to particular objects
accum – not sure of this one, but guessing it could be the percentage of memory ever accumulated by these objects before garbage collection
live bytes – number of live bytes taken up by currently live objects
live objs – number of currently live objects
alloc’ed bytes – I think this is the number of bytes allocated so far for particular objects
alloc’ed objs – likewise I think the number of objects of this type so far allocated
stack trace – stack trace number
class name – class of object
As can be seen most memory is used by the char[] (live bytes
= 123015224/objs
= 1260358). Live bytes/objs
will usually be less than alloc'ed bytes/objs
due to garbage collection taking place, but as the web server was only running for a short time before this profile was taken it is likely that garbage collection had not happened by the time I stopped the server.
Where live bytes/objs
= alloc'ed bytes/objs
this could possibly signify a memory leak. One of the sources I looked at stated that low level objects such as char[] tend to float to the top and advised looking further down the ranking for heads of leaking data structures. (See here).
Here is stack trace 331325 for our highest ranking char[] objects:
TRACE 331325: java.lang.String.<init>(<Unknown Source>:Unknown line) org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:104) org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:155) org.apache.lucene.search.FieldCacheImpl$6.createValue(FieldCacheImpl.java:282) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:260) org.apache.lucene.search.FieldCacheImpl$7.createValue(FieldCacheImpl.java:371) org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:334) org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQueue.java:338)
which were allocated by the section of the code that performs a lucene search with a request to sort the result set.
For the moment it looks as though our “memory leak” issue is due to the sorting functionality in lucene using up a lot of resources rather than a memory leak as such. We are sorting strings which the lucene docs state are the most expensive types to sort in terms of resources because each unique term is cached for each document. This could be a problem for us and we may have to rethink how we do our sorting, if so that will be another topic.
hprof links I found useful:
http://java.sun.com/developer/technicalArticles/Programming/HPROF.html
http://www.skywayradio.com/tech/WAS51/appserver_hangs.php
http://www.skywayradio.com/tech/WAS51/HProf.php
http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?
topic=/com.ibm.java.doc.diagnostics.142/html/id1590.html
http://www.javalobby.org/java/forums/t19612.html
http://www.javaworld.com/javaworld/jw-12-2001/jw-1207-hprof.html