EJB timer cancellations or removals face slowness when many EJB timers are present

Solution Verified - Updated

Environment

  • JBoss Enterprise Application Platform (EAP) 7.4.19 and earlier

Issue

  • When our counts of EJB timers increase, new EJB timer creations or cancellations become slowed. As they face contention in org.jboss.as.controller.registry.AbstractModelResource on a LinkedHashMap lock:
"default task-24" #590 prio=5 os_prio=0 cpu=3381.44ms elapsed=77.80s tid=0x000056341cd35800 nid=0x7ee waiting for monitor entry  [0x00007f28a82ce000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.jboss.as.controller.registry.AbstractModelResource.removeChild(AbstractModelResource.java:195)
	- waiting to lock <0x00000006fa116430> (a java.util.LinkedHashMap)
	at org.jboss.as.ejb3.subsystem.deployment.TimerServiceResource.timerRemoved(TimerServiceResource.java:198)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.unregisterTimerResource(TimerServiceImpl.java:1172)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.removeTimer(TimerServiceImpl.java:708)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.cancelTimer(TimerServiceImpl.java:688)
	at org.jboss.as.ejb3.timerservice.TimerImpl.cancel(TimerImpl.java:156)
  • And the lock owner often looks like:
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap.putVal(java.base@11.0.20/HashMap.java:627)
	at java.util.HashMap.put(java.base@11.0.20/HashMap.java:608)
	at java.util.HashSet.add(java.base@11.0.20/HashSet.java:220)
	at java.util.AbstractCollection.addAll(java.base@11.0.20/AbstractCollection.java:352)
	at java.util.LinkedHashSet.<init>(java.base@11.0.20/LinkedHashSet.java:169)
	at org.jboss.as.controller.registry.AbstractModelResource$DefaultResourceProvider.children(AbstractModelResource.java:272)
	- locked <0x000000072115f158> (a java.util.LinkedHashMap)
	at org.jboss.as.controller.registry.AbstractModelResource$DefaultResourceProvider.hasChildren(AbstractModelResource.java:292)
	at org.jboss.as.controller.registry.AbstractModelResource.removeChild(AbstractModelResource.java:201)
	- locked <0x00000006fa116430> (a java.util.LinkedHashMap)
	at org.jboss.as.ejb3.subsystem.deployment.TimerServiceResource.timerRemoved(TimerServiceResource.java:198)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.unregisterTimerResource(TimerServiceImpl.java:1172)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.removeTimer(TimerServiceImpl.java:708)
	at org.jboss.as.ejb3.timerservice.TimerServiceImpl.cancelTimer(TimerServiceImpl.java:688)
	at org.jboss.as.ejb3.timerservice.TimerImpl.cancel(TimerImpl.java:156)
  • When using database persistence for EJB timers, we then see even more exceptional pauses in EJB invocations as they are periodically blocked by very long org.jboss.as.ejb3.timerservice.persistence.database.DatabaseTimerPersistence$RefreshTask

Resolution

  • Update to EAP 7.4.20+ or 8.0.0+

Root Cause

  • The EJB timers are backed by a map datastructure with a code call that improperly scales in overhead with the child EJB timer count:
  • The DatabaseTimerPersistence$RefreshTask persistently holds a map lock while processing any timers to remove and that become a much longer task run from the above as well as timer instances increase:
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.