Changing jboss-log4j.xml caused JBoss to go unresponsive
Environment
- JBoss Enterprise Application Platform
- 4.3.x
- 5.x
Issue
-
We made a change to the jboss-log4j.xml file and then JBoss went unresponsive.
-
Many threads are stuck waiting to lock a org.apache.log4j.spi.Rootlogger. This RootLogger is locked by the Timer-Log4jService thread, which is in a waiting state itself:
"Timer-Log4jService" daemon prio=10 tid=0x00002aaae02e5800 nid=0x50ac in Object.wait() [0x000000004287a000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1186) - locked <0x00000004232ca6f8> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1239) at org.apache.log4j.AsyncAppender.close(AsyncAppender.java:240) at org.apache.log4j.helpers.AppenderAttachableImpl.removeAllAppenders(AppenderAttachableImpl.java:140) at org.apache.log4j.Category.removeAllAppenders(Category.java:868) - locked <0x000000040f464d78> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.xml.DOMConfigurator.parseChildrenOfLoggerElement(DOMConfigurator.java:410) at org.apache.log4j.xml.DOMConfigurator.parseRoot(DOMConfigurator.java:394) - locked <0x000000040f464d78> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.xml.DOMConfigurator.parse(DOMConfigurator.java:829) at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:712) at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:618) at org.apache.log4j.xml.DOMConfigurator.configure(DOMConfigurator.java:743) at org.jboss.logging.Log4jService$URLWatchTimerTask.reconfigure(Log4jService.java:697) at org.jboss.logging.Log4jService$URLWatchTimerTask.run(Log4jService.java:636) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462)
Resolution
-
Avoid log4j restarts/config changes when using AsyncAppenders.
-
Set the Log4jService's RefreshPeriod attribute to the max of 2147483 in conf/jboss-service.xml, which would only scan for log4j config changes once every 24 days, to help avoid such log4j restarts.
-
Remove AsyncAppenders and/or long, expensive operations through these AsyncAppenders, such as SMTPAppenders.
-
If the root issue is SMTPAppenders hanging, try the following flags to set timeouts:
-
-Dmail.smtp.connectiontimeout=<milliseconds>
-
-Dmail.smtp.timeout=<milliseconds>
-
Root Cause
By default, JBoss scans jboss-log4j.xml for changes every 60 seconds. The config change prompts log4j to restart and reconfigure itself, thus the Timer-Log4jService thread's call to AsyncAppender.close. But this thread is stuck itself at Thread.join in the AsyncAppender.close call for. The join here means it must wait for its dispatcher thread to end before it can proceed. The dispatcher thread will run so long as isActive is true, which is reset at the beginning of each loop iteration based upon the value of the AsyncAppender's closed variable:
public void run()
{
boolean isActive = true;
try
{
while (isActive) {
LoggingEvent[] events = null;
synchronized (this.buffer) {
int bufferSize = this.buffer.size();
isActive = !this.parent.closed;
while ((bufferSize == 0) && (isActive)) {
this.buffer.wait();
bufferSize = this.buffer.size();
isActive = !this.parent.closed;
}
if (bufferSize > 0) {
events = new LoggingEvent[bufferSize + this.discardMap.size()];
this.buffer.toArray(events);
int index = bufferSize;
Iterator iter = this.discardMap.values().iterator();
while (iter.hasNext()) {
events[(index++)] = ((AsyncAppender.DiscardSummary)iter.next()).createEvent();
}
this.buffer.clear();
this.discardMap.clear();
this.buffer.notifyAll();
}
}
if (events != null)
for (int i = 0; i < events.length; i++)
synchronized (this.appenders) {
this.appenders.appendLoopOnAppenders(events[i]);
}
}
}
catch (InterruptedException ex)
{
Thread.currentThread().interrupt();
}
}
The closed variable should now be true per the AsyncAppender.closed() call, meaning the dispatcher thread should terminate on the next completed loop iteration, allowing the AsyncAppender to proceed past the Thread.join() call. But its likely that the dispatcher has a built up number of logging events in its buffer. The dispatcher must finish logging all these events before it will complete its last loop iteration and terminate. This can cause a long hang as all threads using log4j wait for this dispatcher to finish logging its buffered events. This hang can become especially long if the dispatcher's buffered events are expensive logging operations such as email/SMTPAppender operations (which could in turn hang themselves due to connection issues).
Diagnostic Steps
- Troubleshoot using thread dumps and additional steps mentioned in Java application unresponsive
- Check for relevant dispatcher threads that AsyncAppenders are likely waiting on through Thread.join
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.