JBoss ON event purge job results in OutOfMemoryError when there are several millions of event entries

Solution Verified - Updated

Environment

  • Red Hat JBoss Operations Network (ON) 3.3
  • JBoss ON has stored over 10 million events that are now eligible for purge

Issue

  • GC overhead limit exceeded ERROR:

      ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/jboss-remoting-servlet-invoker].[ServerInvokerServlet]] (http-/hostname:7080-127) JBWEB000236: Servlet.service() for servlet ServerInvokerServlet threw exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
    
  • The thread RHQScheduler_Worker is exhausting memory when executing org.rhq.enterprise.server.purge.PurgeTemplate.loadKeys while purging old event data

Resolution

This issue is resolved in JBoss ON 3.3 Update-05.

To workaround the issue:

  1. Increase the JBoss ON system property Delete Events Older Than to a value that will result in less then 5 million rows being eligible for purge. NOTE: Depending on your memory configuration and system load, you may be able to process more or less then 5 million.
  2. Wait for an hour and if the purge job is successful, the Delete Events Older Than property can be slowly decreased back to its original value waiting an hour between setting changes.
  3. You should set Delete Events Older Than to a value that will ensure that no more then 5 million events will be purged at any given time.

It may also be necessary to increase the JBoss ON server JVM's maximum heap memory setting.

Root Cause

The data purge job is attempting to delete old event entries as per the Delete Events Older Than JBoss ON system property setting. During the purge job, the keys for the entries that will be deleted are retained in memory. If there are millions of events, the JBoss ON server JVM may temporarily require several gigabytes of heap to perform the purge.

This issue has been captured in This content is not included.Red Hat Bugzilla 1255196 and has been fixed in JBoss ON 3.3 Update-05.

Diagnostic Steps

  • Review the JBoss ON server log at the time of the first OutOfMemory exception. This will normally be within a few minutes of the start of the event data purge job:

      INFO  [org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob] (RHQScheduler_Worker-1) Purging event data older than Mon Aug 03 16:22:55 UTC 2015
      ERROR [org.jboss.as.ejb3.invocation] (RHQScheduler_Worker-1) JBAS014134: EJB Invocation failed on component PurgeManagerBean for method public abstract int org.rhq.enterprise.server.purge.PurgeManagerLocal.purgeEventData(long): javax.ejb.EJBException: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
          at org.jboss.as.ejb3.tx.BMTInterceptor.handleException(BMTInterceptor.java:80) [jboss-as-ejb3-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
          at org.jboss.as.ejb3.tx.EjbBMTInterceptor.checkStatelessDone(EjbBMTInterceptor.java:92) [jboss-as-ejb3-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
          ...
          at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation-1.1.2.Final-redhat-1.jar:1.1.2.Final-redhat-1]
          at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73) [jboss-as-ee-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
          at org.rhq.enterprise.server.purge.PurgeManagerLocal$$$view157.purgeEventData(Unknown Source) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob.purgeEventData(DataPurgeJob.java:210) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob.purgeEverything(DataPurgeJob.java:97) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.scheduler.jobs.DataPurgeJob.executeJobCode(DataPurgeJob.java:84) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.scheduler.jobs.AbstractStatefulJob.execute(AbstractStatefulJob.java:48) [rhq-server.jar:4.12.0.JON330GA]
          at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-1.6.5.jar:1.6.5]
          at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) [quartz-1.6.5.jar:1.6.5]
      Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
          ... 37 more
      Caused by: java.lang.OutOfMemoryError: Java heap space
          at java.io.ObjectOutputStream$HandleTable.growEntries(ObjectOutputStream.java:2350) [rt.jar:1.7.0_80]
          at java.io.ObjectOutputStream$HandleTable.assign(ObjectOutputStream.java:2275) [rt.jar:1.7.0_80]
          at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427) [rt.jar:1.7.0_80]
          at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) [rt.jar:1.7.0_80]
          at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) [rt.jar:1.7.0_80]
          at org.rhq.enterprise.server.purge.PurgeTemplate.loadKeys(PurgeTemplate.java:152) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.purge.PurgeTemplate.execute(PurgeTemplate.java:93) [rhq-server.jar:4.12.0.JON330GA]
          at org.rhq.enterprise.server.purge.PurgeManagerBean.purgeEventData(PurgeManagerBean.java:79) [rhq-server.jar:4.12.0.JON330GA]
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_80]
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_80]
          ...
    
  • It may be necessary to capture a snapshot of the Java heap and use a heap memory analysis tool to look for the offending thread and object(s).

      Class Name                                                                                | Shallow Heap | Retained Heap | Percentage
      --------------------------------------------------------------------------------------------------------------------------------------
      org.quartz.simpl.SimpleThreadPool$WorkerThread @ 0x78e02ae18  RHQScheduler_Worker-4 Thread|          120 | 1,393,229,320 |     83.70%
      --------------------------------------------------------------------------------------------------------------------------------------
    
      Class Name                                                                                                | Shallow Heap | Retained Heap
      -----------------------------------------------------------------------------------------------------------------------------------------
      org.quartz.simpl.SimpleThreadPool$WorkerThread @ 0x78e02ae18  RHQScheduler_Worker-4 Thread                |          120 | 1,393,229,320
      |- <Java Local> org.quartz.simpl.SimpleThreadPool$WorkerThread @ 0x78e02ae18  RHQScheduler_Worker-4 Thread|          120 | 1,393,229,320
      |- <Java Local> java.util.ArrayList @ 0x794eb3c28                                                         |           24 | 1,393,210,168
      |  |- elementData java.lang.Object[31151587] @ 0x7c6d1a520                                                |  124,606,368 | 1,393,210,144
      -----------------------------------------------------------------------------------------------------------------------------------------
    
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.