JBoss becomes slow/unresponsive with many threads waiting on a BaseClassLoader

Solution Verified - Updated

Environment

  • JBoss Enterprise Application Platform

Issue

  • Our JBoss server becomes slow and unresponsive.  Thread dumps show that many threads are backed up waiting on a BaseClassLoader:
    
    Thread: http-0.0.0.0-8080-1 : priority:5, demon:true, threadId:3295, threadState:TIMED_WAITING
        - waiting on <0x32666d51> (a org.jboss.classloader.spi.base.BaseClassLoader)
        java.lang.Object.wait(Native Method)
        org.jboss.classloader.spi.base.BaseClassLoader.acquireLockFairly(BaseClassLoader.java:1110)
        org.jboss.classloader.spi.base.BaseClassLoader.loadClassFromDomain(BaseClassLoader.java:838)
        org.jboss.classloader.spi.base.BaseClassLoader.doLoadClass(BaseClassLoader.java:502)
        org.jboss.classloader.spi.base.BaseClassLoader.loadClass(BaseClassLoader.java:447)
        java.lang.ClassLoader.loadClass(ClassLoader.java:295)
        - locked <0x2743e345> (a org.jboss.web.tomcat.service.WebCtxLoader$ENCLoader)
        java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        java.lang.Class.forName0(Native Method)
        java.lang.Class.forName(Class.java:247)
    ...
    
  • We see errors caused by the following exceptions during these times as well:
    
    java.lang.IllegalStateException: Waiting too long to get the classloader lock: BaseClassLoader@2d2050ea{vfszip:/jboss-eap-5.1/jboss-as/server/default/deploy/app.war/}
    

Resolution

  • Identify threads in the middle of a BaseClassLoader.loadClassFromDomain call and identify why they take so long to complete this call and release the lock.

    • This has been seen previously on AIX/IBM JDK due to slow progression and hanging through a subsequent java.lang.ClassLoader.defineClassImpl call.
  • Address any deep recursive bugs and StackOverflowErrors that could prevent the lock from being properly released.  See java.lang.StackOverflowError.

Root Cause

  • These threads are looping through 10 second wait intervals to periodically test availability of a lock object belonging to the BaseClassLoader:

    
       private void acquireLockFairly(boolean trace)
       {
    ...
             while (true)
             {
                try
                {
                   if (lock.tryLock(0, TimeUnit.MICROSECONDS) == false)
                   {
                      // Two minutes should be long enough?
                      if (waits++ == 12)
                         throw new IllegalStateException("Waiting too long to get the classloader lock: " + this);
                      // Wait 10 seconds
                      if (trace)
                         log.trace(this + " waiting for lock " + thread);
                      this.wait(10000);
                   }
                   else
                   {
                      if (trace)
                         log.trace(this + " aquiredLock " + thread + " holding=" + lock.getHoldCount());
                      break;
                   }
                }
                catch (InterruptedException ignored)
                {
                   interrupted = true;
                }
             }
    ...
       }
    

    They can wait like this for up to two minutes before giving up and erroring out, hanging up any threads behind them as well during this time to cause serious performance issues.

  • Thread dumps won't specifically expose what holds the lock these threads want, but this lock is only acquired and subsequently released through BaseClassLoader.loadClassFromDomain calls.  Check the thread dump for threads in the middle of this method that have passed the lock acquire.  See what this thread is currently doing to take so long to return out of the BaseClassLoader.loadClassFromDomain call and release the lock.
  • If you don't see any threads in the middle of a BaseClassLoader.loadClassFromDomain call associated with this hang, it's likely a thread grabbed the lock here but returned from BaseClassLoader.loadClassFromDomain without properly releasing it.  The unlock is in a finally so this shouldn't happen, but it looks like it can still occur with a StackOverflowError.  In this case, check the trace logging for a thread that holds a lock for an excessively long time with a holding count that continuously increases.

Diagnostic Steps

  • Troubleshoot using thread dumps and additional steps mentioned in Java application unresponsive
  • Enable TRACE logging for the BaseClassLoader for a more historical, in-depth look at ownership on these locks.  You'd likely want this to a separate, dedicated log file:
    
       <appender name="BASECL" class="org.jboss.logging.appender.DailyRollingFileAppender">
          <errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
          <param name="File" value="${jboss.server.log.dir}/basecl.log"/>
          <param name="Append" value="true"/>
          <param name="DatePattern" value="'.'yyyy-MM-dd"/>
          <layout class="org.apache.log4j.PatternLayout">
             <param name="ConversionPattern" value="%d %-5p [%c] (%t) %m%n"/>
          </layout>
       </appender>
       <category name="org.jboss.classloader.spi.base" additivity="false">
          <priority value="TRACE"/>
          <appender-ref ref="BASECL"/>
       </category>
    
    The %t option in the ConversionPattern is important as it identifies the thread associated with the message, allowing you to correlate this data with thread dumps.
  • Check for deep thread stacks and StackOverflowErrors

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.