Ceph LIBRADOS client side logs are logging *EMFILE* (Too many open files) but why not qemu/kvm ?

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 1.3.z
  • Red Hat Enterprise Linux Openstack Platform 7
  • Qemu/KVM : qemu-kvm-rhev-2.3.0-31.el7_2.7.x86_64

Issue

  • Ceph LIBRADOS client side logs are logging EMFILE (Too many open files) but why not qemu/kvm ?
  • Ceph LIBRADOS client side logs :
7f0b769e7700 -1 -- 192.168.128.30:0/2021513 >> 192.168.128.35:6800/24374 pipe(0x7f0bcabc0000 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0bc55e1ce0).connect couldn't created socket (24) Too many open files

Resolution

Root Cause

This issue has two problems :

  • Qemu does not print the error code, use error_setg_errno() with return code instead of error_setg().
  • LIBRADOS is not returning the proper error code : EMFILE from PIPE::connect() function to rados_connect() function.

Diagnostic Steps

Steps to Reproduce :

  • create a qemu instance
  • Reduce the fd limits with prlimit command
  • Try to attach rbd image to this qemu instance
  • Ceph client side log file will have logs as given below :
7f0b769e7700 -1 -- 192.168.128.30:0/2021513 >> 192.168.128.35:6800/24374 pipe(0x7f0bcabc0000 sd=-1 :0 s=1 pgs=0 cs=0 l=1 c=0x7f0bc55e1ce0).connect couldn't created socket (24) Too many open files
  1. But qemu logs will not log anything regarding this error as rados_connect() call from qemu wont have proper return code from LIBRADOS Pipe::connect().
  • Ceph source file : src/msg/simple/Pipe.cc
 870 int Pipe::connect()
 871 {
 872   bool got_bad_auth = false;
 873 
 874   ldout(msgr->cct,10) << "connect " << connect_seq << dendl;
 875   assert(pipe_lock.is_locked());

....................
.....................

 901   // create socket?
 902   sd = ::socket(peer_addr.get_family(), SOCK_STREAM, 0);
 903   if (sd < 0) {
 904     lderr(msgr->cct) << "connect couldn't created socket " << cpp_strerror(errno) << dendl;
 905     goto fail;
 906   }
  • connect couldn't created socket (24) Too many open files, this error is coming from this function socket() from Pipe::connect() function.
  • and here it is like go to fail label but if we check fail label we are just returning -1 but we should return errno.
1215  fail:
1216   if (conf->ms_inject_internal_delays) {
1217     ldout(msgr->cct, 10) << " sleep for " << msgr->cct->_conf->ms_inject_internal_delays << dendl;
1218     utime_t t;
1219     t.set_from_double(msgr->cct->_conf->ms_inject_internal_delays);
1220     t.sleep();
1221   }
1222 
1223   pipe_lock.Lock();
1224  fail_locked:
1225   if (state == STATE_CONNECTING)
1226     fault();
1227   else
1228     ldout(msgr->cct,3) << "connect fault, but state = " << get_state_name()
1229                        << " != connecting, stopping" << dendl;
1230 
1231  stop_locked:
1232   delete authorizer;
1233   return -1;
1234 }
  • In qemu code it is here : qemu/block/rbd.c:
 r = rados_connect(s->cluster);  
    if (r < 0) {
        error_setg(errp, "error connecting");  
        goto failed_shutdown;
    }
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.