When listing a directory on an NFS share, directory listing fails half way through with 'ls: reading directory .: Too many levels of symbolic links'
Environment
- Red Hat Enterprise Linux 5, 6, 7(NFS server)
- NFS server exports
- ext4, ext3, or gfs2
Issue
- When listing a directory on the NFS client, the directory listing will fail after listing some of the files in the directory with:
ls: reading directory .: Too many levels of symbolic links'
- The following message is visible in
/var/log/messagesof the NFS client:
Oct 2 18:24:47 hostname kernel: NFS: directory server.example.com/nfs contains a readdir loop.Please contact your server vendor.
- Duplicate files begin appearing on NFS shares.
- Duplicate files on same directory.
Resolution
Red Hat Enterprise Linux 7
- An issue involving gfs2 exported filesystems may cause the log messages in Issues section. This issue is resolved as described in the following article: How do you enable "location based readdir cookies" for gfs2 filesystems exported via NFS?
Red Hat Enterprise Linux 6
- If the NFS server is exporting an ext4 filesystem then the server-side NFS issue has been fixed by errata for when duplicate files are seen: RHSA-2012-0862 or higher.
- If the NFS server is exporting an ext3 filesystem then the server-side NFS issue has been fixed by errata for when duplicate files are seen: : RHSA-2014:1392 or higher.
- RHEL 6 NFS client prints "NFS:directory XXXXX contains a readdir loop." due to NFS client issuing the same readdir request twice on a directory whose contents is changing
- An issue involving gfs2 exported filesystems may cause the log messages in Issues section. This issue is resolved as described in the following article: How do you enable "location based readdir cookies" for gfs2 filesystems exported via NFS?
Red Hat Enterprise Linux 5
- An ext3 or ext4 issue has been fixed in RHBA-2013:0006 (kernel-2.6.18-348.el5)
- There have been reports that the aforementioned fix reduces the frequency of the issue, however, the issue is still reproducible. In addition to upgrading the kernel, please consider also applying one of the workarounds listed below.
QNAP
Workaround
- If not RHEL 6+ (see errata above for RHEL 6) and If the NFS server is exporting an ext3 filesystem then reformat it to be ext4.
- Disable the
dir_indexfilesystem feature on the filesystem that is being exported from the NFS server.
# tune2fs -O ^dir_index <device being exported>
- NOTE: To re-enable the
dir_indexfeature use the command:
# tune2fs -O dir_index <device being exported>
Root Cause
- When exporting an ext3, ext4, or gfs2 filesystem, hash collisions can cause the NFS server to send the same cookie for multiple files. This causes confusion when a
readdir()call wants to pick up where the last one left off, and triggers the error. - NOTE The specific bug depends on the filesystem being exported. Thus it is possible you may receive this message if a different filesystem is exported, a newer kernel contains a bug, or a 3rd party NFS server is being used.
Diagnostic Steps
Steps to confirm NFS server bug - Duplicate files on NFS shares due to ext* hash collisions
- On the NFS client, change to the affected directory.
# cd /mnt/directory-from-readdir-loop-message
- Now list all of the files in the directory and count them
# ls -l | wc -l
- Now list all of the files in the directory, but only show duplicates
# ls -l | uniq -d
If the bug occurs, then we should see duplicates in the last step. If the bug does not occur, then the last step should show no output, and the error message may be due to another problem.
General diagnostic steps
- Wait for the issue to reoccur and grab the following data by reproducing the issue with the
lscommand: - On the client capture:
# tcpdump -n -s 0 -i <interface> -w /tmp/tcpdump_client.pcap host <IP of NFS server>
- On the server capture:
# tcpdump -n -s 0 -i <interface> -w /tmp/tcpdump_server.pcap host <IP of NFS client>
- Once the above has started, start NFS debugging on the client and strace the
lscommand to reproduce the issue:
# rpcdebug -m nfs -s all
# rpcdebug -m rpc -s all
# strace -T -tt -f -v -q -s 4096 -o /tmp/strace.out <ls command and args>
# rpcdebug -m nfs -c all
# rpcdebug -m rpc -c all
- The debug logs will appear in
/var/log/messages. - The following are example outputs of the error captured with the commands above:
strace:
11734 18:08:27.298621 write(2, "reading directory /mnt/example/foo/bar", 77) = 77 <0.000008>
11734 18:08:27.298668 open("/usr/share/locale/en_US.ISO8859-1/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000008>
11734 18:08:27.298701 open("/usr/share/locale/en_US.iso88591/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000008>
11734 18:08:27.298732 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000008>
11734 18:08:27.298764 open("/usr/share/locale/en.ISO8859-1/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000007>
11734 18:08:27.298795 open("/usr/share/locale/en.iso88591/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000008>
11734 18:08:27.298825 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000008>
11734 18:08:27.298864 write(2, ": Too many levels of symbolic links", 35) = 35 <0.000008>
11734 18:08:27.298895 write(2, "\n", 1) = 1 <0.000007>
11734 18:08:27.298922 close(3) = 0 <0.000007>
- Debug logs:
Oct 2 18:24:47 hostname kernel: NFS: directory server.example.com/nfs contains a readdir loop.Please contact your server vendor. The file: foobar has duplicate cookie 1357702728
Notice how the debug logs provide the file that was encountered when the readdir() loop was hit.
SBR
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.