Continuous high GCs and heap memory is getting accumulated with high thread contention in JBeret
Environment
- Red Hat JBoss Enterprise Application Platform 7.3.x
Issue
We have migrated another application from EAP6 to EAP7 and during performance testing, we have observed that there is an increase in heap memory usage. On further debugging with heap dump and thread dump, we see that 40+ threads are holding huge chuck of memory.
Thread dump is collected on the server at "2021-04-26 17:41:23" and heap dump is collected on the same VM at "2021-04-26 17:56".
Following is our analysis on the data collected from thread dump and heap dump:
- From the thread dump we could observe 46 threads are in Blocked state & 44 threads among them are pool-14-threads.
- Verified the stack trace of these dumps and we could see that these are utilizing org.jboss.as.jmx.model.RootResourceIterator classes and are blocked on it.
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jboss.as.controller.registry.AbstractModelResource.getProvider(AbstractModelResource.java:233)
- waiting to lock <0x00000006e9ff2808> (a java.util.LinkedHashMap)
at org.jboss.as.controller.registry.AbstractModelResource.getChild(AbstractModelResource.java:94)
at org.jboss.as.controller.OperationContextImpl.getAuthorizationResource(OperationContextImpl.java:1900)
- In the heap dump acquired, we could observe that almost 60%+ of the heap is occupied by pool-14-thread.
- We would like to know why these objects are holding huge amount of heap memory while applications are not using/triggering those threads.
- Thread dumps show start up persisting in jberet query activity like below:
"MSC service thread 1-8" #21 prio=5 os_prio=0 tid=0x0000555c4b21b000 nid=0x219eb9 runnable [0x00007f1403623000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:426)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:390)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:215)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:58)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:929)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3820)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3867)
- locked <0x0000000773922b60> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1502)
at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.executeQuery(CachedPreparedStatement.java:113)
at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:504)
at org.jberet.repository.JdbcRepository.getJobInstances(JdbcRepository.java:345)
at org.wildfly.extension.batch.jberet.job.repository.JobRepositoryService.getJobInstances(JobRepositoryService.java:109)
at org.wildfly.extension.batch.jberet.job.repository.JdbcJobRepositoryService.getJobInstances(JdbcJobRepositoryService.java:44)
at org.jberet.operations.AbstractJobOperator.getJobInstances(AbstractJobOperator.java:195)
at org.wildfly.extension.batch.jberet.deployment.JobOperatorService.getJobInstances(JobOperatorService.java:185)
at org.wildfly.extension.batch.jberet.deployment.BatchJobExecutionResource.lambda$refreshChildren$0(BatchJobExecutionResource.java:221)
at org.wildfly.extension.batch.jberet.deployment.BatchJobExecutionResource$$Lambda$1023/1184953376.get(Unknown Source)
at org.wildfly.extension.batch.jberet.deployment.WildFlyJobOperator.allowMissingJob(WildFlyJobOperator.java:68)
at org.wildfly.extension.batch.jberet.deployment.BatchJobExecutionResource.refreshChildren(BatchJobExecutionResource.java:221)
at org.wildfly.extension.batch.jberet.deployment.BatchJobExecutionResource.getChildrenNames(BatchJobExecutionResource.java:140)
- locked <0x0000000773184a30> (a java.util.LinkedHashSet)
at org.wildfly.extension.batch.jberet.deployment.BatchJobExecutionResource.hasChildren(BatchJobExecutionResource.java:113)
at org.jboss.as.controller.registry.AbstractModelResource$DelegateResource.hasChildren(AbstractModelResource.java:396)
at org.wildfly.extension.microprofile.metrics.MetricCollector.collectResourceMetrics0(MetricCollector.java:161)
at org.wildfly.extension.microprofile.metrics.MetricCollector.collectResourceMetrics0(MetricCollector.java:165)
at org.wildfly.extension.microprofile.metrics.MetricCollector.collectResourceMetrics0(MetricCollector.java:165)
at org.wildfly.extension.microprofile.metrics.MetricCollector.collectResourceMetrics(MetricCollector.java:96)
at org.wildfly.extension.microprofile.metrics.deployment.DeploymentMetricService.start(DeploymentMetricService.java:55)
Resolution
- Update to 7.3.8+ or 7.4.1+. 7.4.1+ includes additional improvements compared to 7.3.8+ so would be preferred.
- Particularly with an Oracle datasource, the driver may use a low fetch size to further degrade this. So you may test the following property set for better scaling:
/subsystem=datasources/data-source=Your_Datasource_Name/connection-properties=defaultRowPrefetch:add(value=50000)
- Moving to EAP 8 provides further improvement with a configurable limit added to the amount of jobs queried by jberet.
Root Cause
- This content is not included.WFLY-14275
- This content is not included.JBEAP-21993
- This content is not included.JBEAP-21992
- This content is not included.JBERET-506
- This content is not included.JBEAP-22172
Diagnostic Steps
Analysing thread thread and heap dumps showed the issue was coming from JMX calls, - MBeanServer.queryMBeans. Disabling application monitoring via JMX proved this.
Further investigation showed the JBeret subsystem is holding job executions in-memory longer than expected.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.