java.net.SocketException: Too many open files

Solution Verified - Updated

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 6.x
    • 7.x
    • 8.x

Issue

  • Receiving the 'Too many open files' exception in server logs:
        Socket accept failed java.net.SocketException: Too many open files
                at java.net.PlainSocketImpl.socketAccept(Native Method)
                at java.net.PlainSocketImpl.accept(Unknown Source)
                at java.net.ServerSocket.implAccept(Unknown Source)
                ...
                ...
        java.io.FileNotFoundException: filename (Too many open files)
                at java.io.FileInputStream.open(Native Method)
                at java.io.FileInputStream.<init>(FileInputStream.java:106)
                at java.io.FileInputStream.<init>(FileInputStream.java:66)
                at java.io.FileReader.<init>(FileReader.java:41)
  • java.net.SocketException: Too many open files -- Need RCA
        <Critical> <Server> <BEA-002616> <Failed to listen on channel "Default[2]" on 127.0.0.1:7001, failure count: 1, failing for 0 seconds, java.net.SocketException: Too many open files>
  • Facing the below error on server :
        19:57:22,988 ERROR [DiskStore] ticketCache: Could not create disk store. Initial cause was /tmp/path/ticketCache.data (Too many open files)
        java.io.FileNotFoundException: /tmp/path/ticketCache.data (Too many open files)
            at java.io.RandomAccessFile.open(Native Method)
            at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
            at net.sf.ehcache.store.DiskStore.initialiseFiles(DiskStore.java:208)
            at net.sf.ehcache.store.DiskStore.<init>(DiskStore.java:153)
            at net.sf.ehcache.Cache.createDiskStore(Cache.java:526)
            at net.sf.ehcache.Cache.initialise(Cache.java:503)
            at net.sf.ehcache.CacheManager.addCacheNoCheck(CacheManager.java:634)
            at net.sf.ehcache.CacheManager.addCache(CacheManager.java:624)
            at org.springframework.cache.ehcache.EhCacheFactoryBean.afterPropertiesSet(EhCacheFactoryBean.java:259)
  • WAR/EAR deployment fails:
        "/deployment=deployment.ear:add(runtime-name="deployment.ear", content=[{"path"=>"/path/to/deployment/deployment.ear","archive"=>false}], enabled=true)"
        {
            "outcome" => "failed",
            "failure-description" => {"WFLYCTL0080: Failed services" => {"jboss.deployment.unit.\"deployment.ear\".STRUCTURE" => "org.jboss.msc.service.StartException in service jboss.deployment.unit.\"deployment.ear\".STRUCTURE: WFLYSRV0153: Failed to process phase STRUCTURE of deployment \"deployment.ear\"
            Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: WFLYEE0054: Failed to process children for EAR [\"/path/to/deployment/deployment.ear\"]
            Caused by: java.io.FileNotFoundException: /jboss-eap-7/standalone/tmp/vfs/deployment/deployment917425a0df51e2b7/library.jar-c42571ac700c1222/library.jar (Too many open files)"}},
            "rolled-back" => true

Resolution

  • See the resolutions in the documents listed in the root cause section.
  • Tune ulimit "open files" setting. See this knowledge article for details.

Root Cause

The following are known causes:

Diagnostic Steps

  • See the diagnostics steps in the documents listed in the root cause section.
  • Get lsof output and check for the open files/sockets:
#PID specific
lsof -p <pid> > lsof.out

#User specific
lsof -u <Jboss_User>  > user_lsof.out
  • Collect server.log from the affected instance (e.g. focusing on the time when Too many open files first appears, and checking which subsystem reports the error).
  • Get netstat ouput to see the state of the sockets (e.g. to see if any connections are hung in CLOSE_WAIT state):
# Linux / Unix 
ss -atnp        # Modern replacement (RHEL 7+)
netstat -atnp   # Deprecated, may not be installed by default 

# Windows
netstat -an
  • To debug files not being closed by the code that opened them, which causes extra file descriptors to be used until the garbage collector finds them, see Steps to locate leaked files with Byteman
  • Check to see if the issue is related to weak references not being reclaimed by seeing if connections go away after a full garbage collection.
    For example, the -XX:+PrintClassHistogram JVM option initiates a full collection, so you could add this option then issue a kill -3 JBOSS_PID. This will need a restart of the server , ie. for the JVM option to take effect.
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.