EAP 6 NIO deadlocks during blocking executor exhaustion

Solution Unverified - Updated 14 Jun 2024

Environment

JBoss Enterprise Application Platform (EAP) 6.x

Issue

We use an HTTP NIO connector on EAP 6 with a blocking-bounded-queue-thread-pool.
JBoss became unresponsive and thread dumps showed most threads blocked like so in ChannelProcessor.run:

"http-executor-threads - 4003" prio=10 tid=0x00007fb688bcd800 nid=0x6bd5 waiting for monitor entry [0x00007fb65ca74000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.tomcat.util.net.NioEndpoint$ChannelProcessor.run(NioEndpoint.java:939)
	- waiting to lock <0x00000007970dd278> (a java.lang.Object)
	at org.jboss.threads.SimpleDirectExecutor.execute(SimpleDirectExecutor.java:33)
	at org.jboss.threads.QueueExecutor.runTask(QueueExecutor.java:808)
	at org.jboss.threads.QueueExecutor.access$100(QueueExecutor.java:45)
	at org.jboss.threads.QueueExecutor$Worker.run(QueueExecutor.java:849)
	at java.lang.Thread.run(Thread.java:745)
	at org.jboss.threads.JBossThread.run(JBossThread.java:122)

The remaining threads and the lock holders looked like the following:

"http-executor-threads - 4004" prio=10 tid=0x00007fb6884fc000 nid=0x6bd6 waiting on condition [0x00007fb6802d1000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000078f1f6d88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at org.jboss.threads.QueueExecutor.execute(QueueExecutor.java:191)
	at org.jboss.threads.DelegatingBlockingExecutorService.execute(DelegatingBlockingExecutorService.java:42)
	at org.jboss.as.threads.ManagedExecutorService.execute(ManagedExecutorService.java:64)
	at org.apache.tomcat.util.net.NioEndpoint.processChannel(NioEndpoint.java:468)
	at org.apache.tomcat.util.net.NioEndpoint$EventPoller.add(NioEndpoint.java:1153)
	at org.apache.tomcat.util.net.NioEndpoint.addEventChannel(NioEndpoint.java:411)
	at org.apache.tomcat.util.net.NioEndpoint.addEventChannel(NioEndpoint.java:385)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.event(Http11NioProtocol.java:875)
	- locked <0x0000000794a02610> (a org.apache.coyote.Request)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.event(Http11NioProtocol.java:872)
	- locked <0x0000000794a02610> (a org.apache.coyote.Request)
	at org.apache.tomcat.util.net.NioEndpoint$ChannelProcessor.run(NioEndpoint.java:940)
	- locked <0x0000000797097670> (a java.lang.Object)
	at org.jboss.threads.SimpleDirectExecutor.execute(SimpleDirectExecutor.java:33)
	at org.jboss.threads.QueueExecutor.runTask(QueueExecutor.java:808)
	at org.jboss.threads.QueueExecutor.access$100(QueueExecutor.java:45)
	at org.jboss.threads.QueueExecutor$Worker.run(QueueExecutor.java:849)
	at java.lang.Thread.run(Thread.java:745)
	at org.jboss.threads.JBossThread.run(JBossThread.java:122)

Resolution

Increase the executor thread pool and queue size to avoid exhaustion from load
Or this deadlock could be fully avoided by switching from a blocking to non-blocking executor:

            <bounded-queue-thread-pool name="http-executor">
                <core-threads count="32"/>
                <queue-length count="32"/>
                <max-threads count="250"/>
                <keepalive-time time="10" unit="seconds"/>
            </bounded-queue-thread-pool>

Then if such pool/queue exhaustion occurs, the Http11ConnectionHandler.event calls can't block for a potential deadlock (the request would be discarded instead).

Root Cause

The blocking executor pool effectively became deadlocked in NIO. Http11NioProtocol$Http11ConnectionHandler.event calls can't complete until space becomes available in the pool/queue, but space won't become available until some ChannelProcessor.run call completes, which can't complete until Http11NioProtocol$Http11ConnectionHandler.event does.
It seems at some point prior to this deadlock that there was some activity high enough to cause pool and queue exhaustion to then cause the Http11ConnectionHandler.event calls to start to block on the full queue. And once those blocks start to occur, the ChannelProcessor.run blocks start to occur. And once ChannelProcessor.run blocks exhaust the pool, it reaches an unrecoverable deadlock.

SBR

Product(s)

Red Hat JBoss Enterprise Application Platform

Components

jbossas

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.