EAP 6 NIO deadlocks during blocking executor exhaustion

Solution Unverified - Updated

Environment

  • JBoss Enterprise Application Platform (EAP) 6.x

Issue

  • We use an HTTP NIO connector on EAP 6 with a blocking-bounded-queue-thread-pool.
  • JBoss became unresponsive and thread dumps showed most threads blocked like so in ChannelProcessor.run:
"http-executor-threads - 4003" prio=10 tid=0x00007fb688bcd800 nid=0x6bd5 waiting for monitor entry [0x00007fb65ca74000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.tomcat.util.net.NioEndpoint$ChannelProcessor.run(NioEndpoint.java:939)
	- waiting to lock <0x00000007970dd278> (a java.lang.Object)
	at org.jboss.threads.SimpleDirectExecutor.execute(SimpleDirectExecutor.java:33)
	at org.jboss.threads.QueueExecutor.runTask(QueueExecutor.java:808)
	at org.jboss.threads.QueueExecutor.access$100(QueueExecutor.java:45)
	at org.jboss.threads.QueueExecutor$Worker.run(QueueExecutor.java:849)
	at java.lang.Thread.run(Thread.java:745)
	at org.jboss.threads.JBossThread.run(JBossThread.java:122)
  • The remaining threads and the lock holders looked like the following:
"http-executor-threads - 4004" prio=10 tid=0x00007fb6884fc000 nid=0x6bd6 waiting on condition [0x00007fb6802d1000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000078f1f6d88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at org.jboss.threads.QueueExecutor.execute(QueueExecutor.java:191)
	at org.jboss.threads.DelegatingBlockingExecutorService.execute(DelegatingBlockingExecutorService.java:42)
	at org.jboss.as.threads.ManagedExecutorService.execute(ManagedExecutorService.java:64)
	at org.apache.tomcat.util.net.NioEndpoint.processChannel(NioEndpoint.java:468)
	at org.apache.tomcat.util.net.NioEndpoint$EventPoller.add(NioEndpoint.java:1153)
	at org.apache.tomcat.util.net.NioEndpoint.addEventChannel(NioEndpoint.java:411)
	at org.apache.tomcat.util.net.NioEndpoint.addEventChannel(NioEndpoint.java:385)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.event(Http11NioProtocol.java:875)
	- locked <0x0000000794a02610> (a org.apache.coyote.Request)
	at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.event(Http11NioProtocol.java:872)
	- locked <0x0000000794a02610> (a org.apache.coyote.Request)
	at org.apache.tomcat.util.net.NioEndpoint$ChannelProcessor.run(NioEndpoint.java:940)
	- locked <0x0000000797097670> (a java.lang.Object)
	at org.jboss.threads.SimpleDirectExecutor.execute(SimpleDirectExecutor.java:33)
	at org.jboss.threads.QueueExecutor.runTask(QueueExecutor.java:808)
	at org.jboss.threads.QueueExecutor.access$100(QueueExecutor.java:45)
	at org.jboss.threads.QueueExecutor$Worker.run(QueueExecutor.java:849)
	at java.lang.Thread.run(Thread.java:745)
	at org.jboss.threads.JBossThread.run(JBossThread.java:122)

Resolution

  • Increase the executor thread pool and queue size to avoid exhaustion from load
  • Or this deadlock could be fully avoided by switching from a blocking to non-blocking executor:
            <bounded-queue-thread-pool name="http-executor">
                <core-threads count="32"/>
                <queue-length count="32"/>
                <max-threads count="250"/>
                <keepalive-time time="10" unit="seconds"/>
            </bounded-queue-thread-pool>
  • Then if such pool/queue exhaustion occurs, the Http11ConnectionHandler.event calls can't block for a potential deadlock (the request would be discarded instead).

Root Cause

  • The blocking executor pool effectively became deadlocked in NIO. Http11NioProtocol$Http11ConnectionHandler.event calls can't complete until space becomes available in the pool/queue, but space won't become available until some ChannelProcessor.run call completes, which can't complete until Http11NioProtocol$Http11ConnectionHandler.event does.
  • It seems at some point prior to this deadlock that there was some activity high enough to cause pool and queue exhaustion to then cause the Http11ConnectionHandler.event calls to start to block on the full queue. And once those blocks start to occur, the ChannelProcessor.run blocks start to occur. And once ChannelProcessor.run blocks exhaust the pool, it reaches an unrecoverable deadlock.
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.