What Information is Required to Debug an NFS-Ganesha Client Issue?
Environment
- Red Hat Gluster Storage version 3.x
- NFS-Ganesha
Issue
- What data is needed if an application running on the top of an NFS-Ganesha mounted file system hangs?
- If an NFS-Ganesha client reports a timeout connecting to the NFS server, what information is required to debug this problem further?
Resolution
-
If the problem is reproducible, set up the environment first to capture the required information:
-
In any Gluster server, enable the debug logs in the
ganesha.conffile. Check the manpageganesha-log-configfor instructions on which components can be traced and the debug levels that can be set. As an example, to further debugREADDIRoperations, add the following entry:LOG { Components { NFS_READDIR = FULL_DEBUG; } }
to the file
/var/run/gluster/shared_storage/nfs-ganesha/ganesha.confand send aSIGHUPsignal to the NFS-Ganesha process id to apply the changes.kill -s SIGHUP <ganesha_process_id>- Capture tcpdumps at the time the issue is observed, from the client and the server sides:
2.1) In the client side, the command to use would be: tcpdump -i < interface > -w /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap 2.2) To capture a tcpdump from the NFS-Ganesha server, verify first which Gluster node is providing access to the NFS share for the client having issues. * Use the mount command in the client side and look for the entry pointing to the file system that needs to be traced: 10.0.2.1:/nfs_ganesha_test nfs4 400G 123G 278G 31% /test * In the server side, check which host has assigned the VIP from the previous step. Use the same command as in section 2.1 to get the information.-
In the client side also, enable the debug flags for sunrpc and nfs services:
rpcdebug -m nfs -s all
rpcdebug -m rpc -s all
Please, be aware that these options produce a very verbose output. They might occasionally slow down the server, so make sure to disable them once the testing is finished. To unset them, just do:
rpcdebug -m nfs -c all rpcdebug -m rpc -c all -
-
With the above debugging enabled, reproduce the problem.
-
If an application is hanging while accessing an NFS file system, get an strace of the application process as follows:
strace -ff -T -s500 -v -f -y -tt -o /tmp/strace-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S") -p
-
Find the NFS-client process PID and capture the output of at least three
pstackcommands, while the issue is being observed. If the pstack utility is not installed, please install the gdb package first.for i in {1..3}; do pstack
> /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S"); done
To figure out which is the correct process ID, check in the output of a
mountcommand to verify the file system is being mounted with NFS-Ganesha v3 or NFS-Ganesa v4If v3, there should be a
nfsiodprocess running:root 4117 0.0 0.0 0 0 ? S 2018 0:00 [nfsiod]If v4, the process should be
nfsv4.0root 4117 0.0 0.0 0 0 ? S 2018 0:00 [nfsv4.0-svc]-
Similarly, capture three pstacks of the
nfs-ganesha.serviceprocess in the server identified in step 2.2 above:for i in {1..3}; do pstack <nfs-ganesha.service-pid > > /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S"); done
-
-
Once the testing is finished, upload the following information for analysis:
-
Sosreports of the NFS-Ganesha servers and the affected client.
-
A tarball containing the following files:
a) Pcap files of the NFS-Ganesha server server and the client, stored at: /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S").pcap
b) Strace output of the application stored in the client side at: /tmp/strace-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")
c) Pstack files of the NFS-Ganesha server and client stored at: /tmp/pstack-$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.