Manila shares with Red Hat OpenStack 17.1 can be abruptly disconnected due to export information loss
Environment
- Deployments with Red Hat OpenStack Platform versions 17.1 GA, 17.1.1 and 17.1.2 that enable the OpenStack Shared File System service (Manila) using CephFS-via-NFS backend storage. This bug also manifests during upgrades from Red Hat OpenStack Platform version 16.2
- Red Hat OpenStack Platform 16.2
- Red Hat OpenStack Platform 17.1
Issue
This content is not included.Bug #2255324 was identified in Red Hat OpenStack 17.1 deployments where clients can lose access to mounted CephFS-via-NFS OpenStack Shared File Systems (Manila shares).
Resolution
Prior to running an update on the overcloud for any reason, for example, minor updates and fast forward upgrades from Red Hat OpenStack Platform 16.2, take a backup of the export data within the Ceph-NFS service:
Step 1: Identify the node where the ceph-nfs service ("ceph-nfs-pacemaker") is run by pacemaker:
# ssh tripleo-admin@<controller-0>
# sudo pcs status | awk '/ceph-nfs/ {print $5}'
Replace
Step 2: Run the following on the controller node that contains the "ceph-nfs-pacemaker" service:
# podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index export_index_backup.txt
You may inspect the data in the "export_index_backup.txt" file. If you had manila shares created, you will have one or more lines in this file, each containing a RADOS object URL to export information. These individual export information objects will exist on RADOS, and are not affected by this bug.
Step 3: Once the stack update procedure is complete, ensure that the ganesha-export-index is recreated:
# podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data put ganesha-export-index export_index_backup.txt
Step 4: Verify that the object exists, and its contents match:
# podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index -
Root Cause
Director’s ceph-nfs ansible role erroneously overwrites an export metadata file that is necessary for the recovery operations within the Ceph-NFS (NFS-Ganesha) server. This ansible role is invoked with the “openstack overcloud deploy” command.
Diagnostic Steps
When OpenStack Manila CephFS-via-NFS shares are mounted to client workloads, they can experience an unexpected outage. The outage can be observed at any time after the overcloud has been updated.In cases where the NFS share is “hard” mounted, running client applications can seem to pause, hang or crash with a timeout while waiting to reconnect to the NFS server.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.