NFSv4 doesn't handle large Kerberos tickets

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Issue

Under RHEL7 and RHEL6 when I set up NFSv4 with krb5 security some users are denied access due to large Kerberos tickets. Other users with smaller tickets have no issues accessing the files under the NFSv4 mount point.

Resolution

RHEL 7
Update to RHEL 7.2, which by default uses gssproxy instead of rpc.svcgssd. Note: Earlier versions of RHEL 7 still had some issues with large Kerberos tickets and NFS. Those have been addressed in both the kernel (via commit "svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures") and in gssproxy (via commit "Suppress exported_composite_name for the kernel").

RHEL 6 and RHEL 5
Option 1 Reduce the number of groups of which the user is a member. This can be done by consolidating groups, for example.

Option 2 Configure the Active Directory KDC omit the PAC data when generating a service ticket for the NFS server:

  1. Go into Active Directory Users and Computers
  2. Select View -> Advanced Features
  3. Expand the Computers branch in the tree view and open the account corresponding to the NFS SERVER
  4. Select the Attribute Editor tab of the Properties dialog that appears
  5. Edit the userAccountControl property
  6. Add 33554432 (0x2000000) to the existing value and click OK
  7. Make sure that the Value column now shows NO_AUTH_DATA_REQUIRED (it may be necessary to expand the column to see it)

The above steps would only be performed accounts that are related to service principals, not for normal user accounts.

Once the above steps have been perfomed, it is necessary to either reboot the NFS CLIENTS (easier), or to unmount and remount the NFS filesystems from the clients and then have the users kinit again before they try to access the NFS filesystem.

See http://support.microsoft.com/kb/832572 for more information about the NO_AUTH_DATA_REQUIRED UserAccountControl property

Root Cause

  • The communication mechanism between the kernel nfs server and the rpc.svcgssd daemon in userspace is via procfs files (specifically /proc/net/rpc/auth.rpcsec.context/channel and /proc/net/rpc/auth.rpcsec.init/channel). The kernel only allocates a single page for the upcall and to make matters worse the binary blob that we get in the response is converted to ascii which is where the 2KB limitation arises from. gssproxy uses a new RPC-based upcall mechanism which overcomes the 2KB limitation.
  • If the KDC is MS Active Directory, then it includes PAC information and this increases the size of the krb5 tickets proportional to the number of AD groups a user is in. The more connected the user, the more groups they are in and the greater chance that NFSv4/krb5 stops working.

Diagnostic Steps

Data directory exported via

 /data *(sec=krb5:krb5i:krb5p,rw,insecure,no_subtree_check,crossmnt)

and mounted via

 prague:/data /nfs/server nfs4 noauto,sec=krb5,intr 0 0

The krb5 credential cache files are significant large for some users than
others: eg.

-rw-------. 1 user1 user1 2118 Jun 23 08:47 /tmp/krb5cc_1001_JR3kfuPGxc
-rw-------. 1 user2 user2 1702 Jun 23 08:51 /tmp/krb5cc_1002

user2 works but user1 doesn't.

The error from rpcgssd is

 WARNING: Failed to create krb5 context for user with uid 1001 for server demo.com

If the NFS server is running with verbose RPC debug logging enabled, then you will see output similar to the following:

Jul 14 11:57:50 hostname kernel: svc: TCP complete record (2636 bytes)
Jul 14 11:57:50 hostname kernel: svc: got len=2636
Jul 14 11:57:50 hostname kernel: svc: svc_authenticate (6)
Jul 14 11:57:50 hostname kernel: RPC:       svcauth_gss: argv->iov_len = 2608
Jul 14 11:57:50 hostname kernel: RPC:       Want update, refage=120, age=0
Jul 14 11:57:50 hostname kernel: svc: svc_delete_xprt(ffff88013cb1c000)
Jul 14 11:57:50 hostname kernel: svc: svc_tcp_sock_detach(ffff88013cb1c000)
Jul 14 11:57:50 hostname kernel: svc: svc_sock_detach(ffff88013cb1c000)
Jul 14 11:57:50 hostname kernel: svc: svc_process dropit
Jul 14 11:57:50 hostname kernel: svc: xprt ffff88013cb1c000 dropped request

The key line is the svcauth_gss line. If the length is greater than 2048 then the upcall will fail.

Jul 14 11:57:50 hostname kernel: RPC:       svcauth_gss: argv->iov_len = 2608
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.