Java application causes native memory leak

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
  • JBoss Enterprise Web Server (EWS) Tomcat
  • Red Hat Enterprise Linux (RHEL) Tomcat
  • Red Hat AMQ

Issue

  • We are seeing at the OS level a slow memory leak over time.
  • The heap utilization stays at 4GB yet the physical memory use will rise from 4GB to 10GB over the course of the week.
  • The JVM size is growing large enough that it is killed by OOM killer.
  • The Java process size is much larger than expected.

Resolution

See the resolution for the relevant root cause.

Root Cause

Diagnostic Steps

Quantify how large the Java process is growing:

  • Capture OS level data to show the Java process memory increase over time. For example:

          top -b -d 3600 -H >> top.out
    

    This will capture top output every hour in a file called top.out.

  • On Linux, collect a series of pmap output over time. Run the attached pmap_linux.sh script, passing in the JBoss PID as an argument. For example:

          sh ./pmap_linux.sh JBOSS_PID            
    

    The script will capture pmap output every hour in a file called pmap.out. As the process memory grows beyond the expected JVM process size the leak will become more prominent in the output.

    Be sure to test the script before using to make sure it runs properly in your environment.

Verify the Java process size is much larger than expected:

Heap analysis:

  • Get a heap dump when the process size is very large and look for known objects that allocate native memory (e.g. java.nio.ByteBuffer with direct access):
    • How do I create a Java heap dump?
    • How do I analyze a Java heap dump?
    • Check the maximum amount of direct memory that can be allocated by inspecting sun.misc.VM.directMemory .
    • View the java.nio.DirectByteBuffer objects using native memory:
      • SELECT d.capacity FROM java.nio.DirectByteBuffer d WHERE (d.cleaner != null)
    • Determine the amount of native memory that can be reclaimed when the Cleaner queue runs:
      • SELECT c.capacity FROM OBJECTS ( SELECT OBJECTS referent FROM INSTANCEOF sun.misc.Cleaner ) c WHERE (c.capacity != null)
      • Export as txt file (top menu item)
      • Open in LibreOffice and sum

Understand the environment to see if it matches any known issues:

  • Is it a new deployment or an application that has been running fine a long time and now has an issue.
  • If it has been running a long time in production, what has changed recently (e.g. OS, JDK, application)?
  • Can the issue be reproduced predictably and consistently?
    • Can functionality be removed until the issue is not reproduced to narrow down the issue?
  • Review environment information in boot.log.
  • Is Java being run in a virtual environment in a guest OS or on a physical operating system?
  • Are there any Java JNI native components being used (e.g. the APR native connectors)? Can the issue be reproduced with the JNI components removed?
    • Set the -verbose:jni flag for more details on JNI calls that could be related to the issue.
  • Are a lot of JSPs being redeployed many times over?
  • Are deployments in a compressed format? If so, try unzipped deployments instead.
  • Gather thread dumps when the process size is large and check for unusual amounts of threads eating up space with their thread stacks:

If the issue is reproducible in a test environment, it may be possible to use valgrind on Red hat Enterprise Linux (RHEL) and OpenJDK:

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.