What Information is Required to Debug an NFS-Ganesha Client Issue?

Solution Verified - Updated

Environment

  • Red Hat Gluster Storage version 3.x
  • NFS-Ganesha

Issue

  • What data is needed if an application running on the top of an NFS-Ganesha mounted file system hangs?
  • If an NFS-Ganesha client reports a timeout connecting to the NFS server, what information is required to debug this problem further?

Resolution

  • If the problem is reproducible, set up the environment first to capture the required information:

    1. In any Gluster server, enable the debug logs in the ganesha.conf file. Check the manpage ganesha-log-configfor instructions on which components can be traced and the debug levels that can be set. As an example, to further debug READDIRoperations, add the following entry:

      LOG { Components { NFS_READDIR = FULL_DEBUG; } }

    to the file /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf and send a SIGHUPsignal to the NFS-Ganesha process id to apply the changes.

       kill -s SIGHUP <ganesha_process_id> 
    
    1. Capture tcpdumps at the time the issue is observed, from the client and the server sides:
     2.1) In the client side, the command to use would be:
    
        tcpdump -i  < interface > -w /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap
    
     2.2) To capture a tcpdump from the NFS-Ganesha server, verify first which Gluster node is providing access to the NFS share for the client having issues. 
      
    * Use the mount command in the client side and look for the entry pointing to the file system that needs to be traced:
    
                10.0.2.1:/nfs_ganesha_test  nfs4   400G  123G  278G  31% /test
    
    * In the server side, check which host has assigned the VIP from the previous step. Use the same command as in section 2.1 to get the information.
    
    1. In the client side also, enable the debug flags for sunrpc and nfs services:

      rpcdebug -m nfs -s all
      rpcdebug -m rpc -s all

    Please, be aware that these options produce a very verbose output. They might occasionally slow down the server, so make sure to disable them once the testing is finished. To unset them, just do:

      rpcdebug -m nfs -c all
      rpcdebug -m rpc -c all
    
  • With the above debugging enabled, reproduce the problem.

    1. If an application is hanging while accessing an NFS file system, get an strace of the application process as follows:

      strace -ff -T -s500 -v -f -y -tt -o /tmp/strace-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S") -p

    2. Find the NFS-client process PID and capture the output of at least three pstack commands, while the issue is being observed. If the pstack utility is not installed, please install the gdb package first.

      for i in {1..3}; do pstack > /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S"); done

    To figure out which is the correct process ID, check in the output of a mountcommand to verify the file system is being mounted with NFS-Ganesha v3 or NFS-Ganesa v4

    If v3, there should be a nfsiod process running:

      root      4117  0.0  0.0      0     0 ?        S     2018   0:00 [nfsiod]
    

    If v4, the process should be nfsv4.0

      root      4117  0.0  0.0      0     0 ?        S     2018   0:00 [nfsv4.0-svc] 
    
    1. Similarly, capture three pstacks of the nfs-ganesha.service process in the server identified in step 2.2 above:

      for i in {1..3}; do pstack <nfs-ganesha.service-pid > > /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S"); done

  • Once the testing is finished, upload the following information for analysis:

    • Sosreports of the NFS-Ganesha servers and the affected client.

    • A tarball containing the following files:

      a) Pcap files of the NFS-Ganesha server server and the client, stored at: /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap
      b) Strace output of the application stored in the client side at: /tmp/strace-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")
      c) Pstack files of the NFS-Ganesha server and client stored at: /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")

SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.