JON Agent is failing with error OutOfMemoryError: PermGen space or Java heap space

Solution Verified - Updated

Environment

  • JBoss Operations Network (JON)

    • 2.3.1
    • 2.4
  • JON Agent managing 15 or more server resources

Issue

  • The agent seems to be running out of memory.

  • The PermGen has been increased from the default to 256m and the PermGen OutOfMemoryError (OOME) still persist.

  • The following messages appear in the agent log:

          WARN  [RHQ VM Health Check Thread] (org.rhq.enterprise.agent.VMHealthCheckThread)- {VMHealthCheckThread.mem-low}VM health check thread has detected [VM nonheap] memory has crossed the threshold [0.9] and is low: memory-usage=[init = 272629760(266240K) used = 300211520(293175K) committed = 301236224(294176K) max = 301989888(294912K)]
          WARN  [RHQ VM Health Check Thread] (org.rhq.enterprise.agent.VMHealthCheckThread)- {VMHealthCheckThread.gc}VM health check thread is invoking the garbage collector to see if more memory can be freed
          WARN  [RHQ VM Health Check Thread] (org.rhq.enterprise.agent.VMHealthCheckThread)- {VMHealthCheckThread.mem-low}VM health check thread has detected [VM nonheap] memory has crossed the threshold [0.9] and is low: memory-usage=[init = 272629760(266240K) used = 300211520(293175K) committed = 301236224(294176K) max = 301989888(294912K)]
          FATAL [RHQ VM Health Check Thread] (org.rhq.enterprise.agent.VMHealthCheckThread)- {VMHealthCheckThread.mem-problem}VM health check thread sees that memory is critically low and will try to reboot the agent
          INFO  [RHQ VM Health Check Thread] (org.rhq.enterprise.communications.ServiceContainer)- {ServiceContainer.global-concurrency-limit-disabled}Global concurrency limit has been disabled - there is no limit to the number of incoming commands allowed
          FATAL [RHQ VM Health Check Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.startup-error}The agent encountered an error
    
  • Agent PermGen issue

  • Several of the agents in our environment have run out of Perm space.

  • The following messages appear in the agent log:

          ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=myjonagent.domain.com, rhq.externalizable-strategy=AGENT, rhq.security-token=1276890967324-3432556677-9876345129753124680, rhq.send-throttle=true}]; params=[{targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService, invocation=NameBasedInvocation[getLatestPlugins]}]]. Cause: java.lang.Exception:java.lang.OutOfMemoryError: PermGen space -> java.lang.OutOfMemoryError:PermGen space. Cause: java.lang.Exception: java.lang.OutOfMemoryError: PermGen space
          ERROR [RHQ Agent Registration Thread] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=myjonagent.domain.com, rhq.externalizable-strategy=AGENT, rhq.send-throttle=true}]; params=[{targetInterfaceName=org.rhq.core.clientapi.server.core.CoreServerService, invocation=NameBasedInvocation[registerAgent]}]]. Cause: java.lang.Exception:java.lang.OutOfMemoryError: PermGen space -> java.lang.OutOfMemoryError:PermGen space. Cause: java.lang.Exception: java.lang.OutOfMemoryError: PermGen space
          WARN  [RHQ Server Polling Thread] (enterprise.communications.command.client.ServerPollingThread)- {ServerPollingThread.server-offline}The server has gone offline; client has been told to stop sending commands
          WARN  [RHQ Agent Registration Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.agent-registration-failure}Agent failed to register with the server. retry=[true], retry interval=[60,000]. Cause: java.lang.reflect.UndeclaredThrowableException:null -> java.lang.Exception:java.lang.OutOfMemoryError: PermGen space -> java.lang.OutOfMemoryError:PermGen space. Cause: java.lang.reflect.UndeclaredThrowableException
    
  • Agent out of memory issues

  • Agent is unable to communicate with server and a message similar to the following appears in the agent log:

          ERROR [InventoryManager.discovery-1] (enterprise.communications.command.client.ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.agent-name=myagent.mydomain.com, rhq.externalizable-strategy=AGENT, rhq.security-token=1265698764567-2016581122-2331818342121239395, rhq.timeout=1800000, rhq.send-throttle=true}]; params=[{targetInterfaceName=org.rhq.core.clientapi.server.discovery.DiscoveryServerService, invocation=NameBasedInvocation[mergeInventoryReport]}]]. Cause: java.lang.Exception:java.lang.OutOfMemoryError: Java heap space -> java.lang.OutOfMemoryError:Java heap space. Cause: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
    
  • Auto discovery is failing and the following appears in the agent log:

          WARN  [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Exception caught while running server discovery
          java.lang.reflect.UndeclaredThrowableException
               at $Proxy4.mergeInventoryReport(Unknown Source)
               at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:873)
               at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.call(AutoDiscoveryExecutor.java:121)
               at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.run(AutoDiscoveryExecutor.java:92)
               at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
               at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
               ...
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
               at java.lang.Thread.run(Thread.java:595)
          Caused by: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
               at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:112)
               at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
               at java.util.concurrent.FutureTask.run(FutureTask.java:123)
               ... 3 more
          Caused by: java.lang.OutOfMemoryError: Java heap space
    
  • Availability is not be reported correctly by an agent and the following appears in the agent's log:

          WARN  [InventoryManager.availability-1] (rhq.core.pc.inventory.InventoryManager)- Could not transmit availability report to server
          java.lang.reflect.UndeclaredThrowableException
               at $Proxy4.mergeAvailabilityReport(Unknown Source)
               at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:832)
               at org.rhq.core.pc.inventory.AvailabilityExecutor.run(AvailabilityExecutor.java:90)
               at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
               at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
               ...
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
               at java.lang.Thread.run(Thread.java:595)
          Caused by: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
               at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:112)
               at org.rhq.enterprise.communications.command.client.ClientCommandSenderTask.call(ClientCommandSenderTask.java:55)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
               at java.util.concurrent.FutureTask.run(FutureTask.java:123)
               ... 3 more
          Caused by: java.lang.OutOfMemoryError: Java heap space
    
  • Discovery is failing and the agent logs:

          WARN  [ResourceDiscoveryComponent.invoker.daemon-2] (rhq.core.pluginapi.inventory.ResourceContext)- Cannot get native process for resource [/opt/jboss/eap/jboss-eap-4.3] - discovery failed
          java.lang.Exception: Discovery component invocation failed.
               at org.rhq.core.pc.util.DiscoveryComponentProxyFactory$ComponentInvocationThread.call(DiscoveryComponentProxyFactory.java:283)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
               ...
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
               at java.lang.Thread.run(Thread.java:595)
          Caused by: java.lang.OutOfMemoryError: Java heap space
               at com.sun.org.apache.xpath.internal.VariableStack.reset(VariableStack.java:135)
               at com.sun.org.apache.xpath.internal.VariableStack.<init>(VariableStack.java:45)
               at com.sun.org.apache.xpath.internal.XPathContext.<init>(XPathContext.java:419)
               at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
               at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
               at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:365)
               at org.jboss.on.common.jbossas.JmxInvokerServiceConfiguration.parseDocument(JmxInvokerServiceConfiguration.java:109)
               at org.jboss.on.common.jbossas.JmxInvokerServiceConfiguration.<init>(JmxInvokerServiceConfiguration.java:55)
               at org.jboss.on.common.jbossas.JBossASDiscoveryUtils.getJmxInvokerSecurityDomain(JBossASDiscoveryUtils.java:86)
               at org.jboss.on.common.jbossas.JBossASDiscoveryUtils.getJmxInvokerUserInfo(JBossASDiscoveryUtils.java:42)
               at org.rhq.plugins.jbossas.JBossASDiscoveryComponent.processAutoDiscoveredProcesses(JBossASDiscoveryComponent.java:194)
               at org.rhq.plugins.jbossas.JBossASDiscoveryComponent.discoverResources(JBossASDiscoveryComponent.java:89)
               at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
               at java.lang.reflect.Method.invoke(Method.java:592)
               at org.rhq.core.pc.util.DiscoveryComponentProxyFactory$ComponentInvocationThread.call(DiscoveryComponentProxyFactory.java:279)
               ... 5 more
          WARN  [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Failure during discovery for [RHQ Server Communications Subsystem] Resources - failed after 589 ms.
          java.lang.Exception: Discovery component invocation failed.
               at org.rhq.core.pc.util.DiscoveryComponentProxyFactory$ComponentInvocationThread.call(DiscoveryComponentProxyFactory.java:283)
               at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
               ...
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
               at java.lang.Thread.run(Thread.java:595)
          Caused by: org.mc4j.ems.connection.EmsConnectException: Connection failure Java heap space
               at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.GenericMBeanServerProxy.invoke(GenericMBeanServerProxy.java:160)
               at $Proxy67.queryNames(Unknown Source)
               at org.mc4j.ems.impl.jmx.connection.DConnection.queryBeans(DConnection.java:301)
               at org.mc4j.ems.impl.jmx.connection.DConnection.queryBeans(DConnection.java:326)
               at org.rhq.plugins.jmx.MBeanResourceDiscoveryComponent.performDiscovery(MBeanResourceDiscoveryComponent.java:147)
               at org.rhq.plugins.jmx.MBeanResourceDiscoveryComponent.discoverResources(MBeanResourceDiscoveryComponent.java:96)
               at org.rhq.plugins.jmx.MBeanResourceDiscoveryComponent.discoverResources(MBeanResourceDiscoveryComponent.java:84)
               at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
               at java.lang.reflect.Method.invoke(Method.java:592)
               at org.rhq.core.pc.util.DiscoveryComponentProxyFactory$ComponentInvocationThread.call(DiscoveryComponentProxyFactory.java:279)
               ... 5 more
          Caused by: java.lang.OutOfMemoryError: Java heap space
    
  • Metric are not being collected for one or more resources and the agent log contains:

          ERROR [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failed to run measurement collection
          java.lang.OutOfMemoryError: Java heap space
          WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=760681, type=ConnectionFactory, key=jboss.jca:name=jms/cbsconfig/queue/MAPPNOCEVENTERR,service=ConnectionFactoryBinding, name=jms/cbsconfig/queue/MAPPNOCEVENTERR Connection Factory, parent=lxdpaa01 JBossEAP 4.3.0.GA_CP03 lxdpar01 (www.example.com:1099)] - cause: java.lang.RuntimeException:Unable to load attributes on bean [jboss.jca:name=jms/myjmsconfig/queue/EVENTERR,service=ManagedConnectionPool] Connection failure Java heap space -> org.mc4j.ems.connection.EmsConnectException:Connection failure Java heap space -> java.lang.OutOfMemoryError:Java heap space
          ERROR [ResourceContainer.invoker.daemon-3] (org.rhq.plugins.jbossas.JBossASServerComponent)- Failed to obtain measurement [jboss.system:type=ServerInfo:ActiveThreadCount]
          org.mc4j.ems.connection.EmsException: Could not load attribute value Connection failure Java heap space
               at org.mc4j.ems.impl.jmx.connection.bean.attribute.DAttribute.refresh(DAttribute.java:235)
               at org.rhq.plugins.jbossas.JBossASServerComponent.getValues(JBossASServerComponent.java:392)
               ...
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
               at java.lang.Thread.run(Thread.java:595)
          Caused by: org.mc4j.ems.connection.EmsConnectException: Connection failure Java heap space
               at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.GenericMBeanServerProxy.invoke(GenericMBeanServerProxy.java:160)
               at $Proxy67.getAttribute(Unknown Source)
               at org.mc4j.ems.impl.jmx.connection.bean.attribute.DAttribute.refresh(DAttribute.java:199)
               ... 10 more
          Caused by: java.lang.OutOfMemoryError: Java heap space
    

Resolution

Due to This content is not included.Bug 615377, do not restart the agent's plug-in container multiple times without restarting the entire agent. The agent's plug-in container is restarted when plug-ins are updated or the restart plug-in container operation is invoked from the agent's command-prompt or the agent resource in JON inventory.

Increase the maximum heap in the JON Agent's JVM. For example, in rhq-agent-env.[sh|bat], uncomment RHQ_AGENT_JAVA_OPTS and add an appropriate value for -Xms, -Xmx, If the agent's log indicates that PermGen space is an issue, you will also need to increase the maximum permanent generation memory using the -XX:PermSize and -XX:MaxPermSize JVM options. For example,

RHQ_AGENT_JAVA_OPTS="-Xms1024m -Xmx1024m -XX:PermSize=256M -XX:MaxPermSize=256M -Djava.net.preferIPv4Stack=true"

Root Cause

If the heap dump shows that there are a very large number of classes loaded in the JVM, this can be attributed to how the JON plug-in container isolates the classes for each server connection.  This is done so that multiple versions and patch levels of managed resources can coexist and be managed by a single JON Agent.  For example, different JBoss versions are not serial compatible, so the JON Agent utilizes the JARs installed in each of the JBoss server installations for making remote connections to each specific JBoss server.  They are each loaded in separate classloaders so high utilization of PermGen is expected when connecting to many servers.

Memory usage is dependent on the number of resources an agent needs to monitor and how aggressive the agent is with availability reporting and metric collection for its managed resources. For example, an agent that is managing 15 EAP 4.3 instances requires more than 130MB of memory just for the class loaders necessary to communicate with each instance independently.

A memory leak at the JVM level has also been identified in This content is not included.Bug 615377 - restarting plugin container causes classloaders to leak and will eventually cause permgen to run out. This leak is only seen when the plug-in container is being restarted, which does not occur under normal conditions. However, this issue can be seen in a development environment where plug-ins are being updated frequently in the JON server causing the JON agent's to restart their own plug-in containers to pick up the updates.

Some observations for engineering tests that used 30 JBoss Enterprise Application Platform (EAP) 4.3 instances being monitored for over 20 hours.

Using the out of the box config, it was possible to replicate the initial OutOfMemoryError exception very quickly.  But that is to be expected monitoring such a large amount of JBossAS servers. The agent was then restarted with the following settings:

    -Xms1024M -Xmx1024M -XX:PermSize=256M -XX:MaxPermSize=256M

Observations from the test:

  1. Eden Space usage spike to almost 400MB. Eden space is taken from the heap, Xmx and NOT non-heap (which is where perm gen is).
  2. There was a PermGen spike to the full 256M but it never ran out of perm gen space.
  3. The heap space, never used the full 1024M - The max heap usage ended up being about 560MB, with an average of about 350MB.

Diagnostic Steps

  • Enable garbage collection (GC) logging to indentify GC patterns - See How do I enable Java garbage collection logging?

  • Obtain a heap dump when the problem occurs - See How do I create a Java heap dump?

    • Review the heap dump for a class loader which is using a large amount of memory:
      • Is the JON Agent JVM consuming a lot of memory due to too many bindings sets configured in bindings.xml

      • One instance of "org.rhq.core.pc.inventory.InventoryManager" loaded by "sun.misc.Launcher$AppClassLoader" occupies 98,066,384 (55.09%) bytes. The memory is accumulated in one instance of "java.util.HashMap$Entry[]" loaded by "<system class loader>".

        • Accumulated Objects: org.rhq.core.pc.inventory.InventoryManager
            * java.util.Collections$SynchronizedMap
              * java.util.HashMap
                * java.util.HashMap$Entry[4096]
                  * org.mc4j.ems.impl.jmx.connectio
                  * org.mc4j.ems.impl.jmx.connectio
                  * org.mc4j.ems.impl.jmx.connectio
                  * org.mc4j.ems.impl.jmx.connectio
                  * org.mc4j.ems.impl.jmx.connectio
                    ...
  • How many resources are being managed by the JON Agent?


This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.