Manila shares with Red Hat OpenStack 17.1 can be abruptly disconnected due to export information loss

Solution Verified - Updated 17 May 2024

Environment

Deployments with Red Hat OpenStack Platform versions 17.1 GA, 17.1.1 and 17.1.2 that enable the OpenStack Shared File System service (Manila) using CephFS-via-NFS backend storage. This bug also manifests during upgrades from Red Hat OpenStack Platform version 16.2
Red Hat OpenStack Platform 16.2
Red Hat OpenStack Platform 17.1

Issue

This content is not included.Bug #2255324 was identified in Red Hat OpenStack 17.1 deployments where clients can lose access to mounted CephFS-via-NFS OpenStack Shared File Systems (Manila shares).

Resolution

Prior to running an update on the overcloud for any reason, for example, minor updates and fast forward upgrades from Red Hat OpenStack Platform 16.2, take a backup of the export data within the Ceph-NFS service:

Step 1: Identify the node where the ceph-nfs service ("ceph-nfs-pacemaker") is run by pacemaker:

  # ssh tripleo-admin@<controller-0>
  # sudo pcs status | awk '/ceph-nfs/ {print $5}'

Replace with the ip address of the controller-0 node

Step 2: Run the following on the controller node that contains the "ceph-nfs-pacemaker" service:

   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index export_index_backup.txt

You may inspect the data in the "export_index_backup.txt" file. If you had manila shares created, you will have one or more lines in this file, each containing a RADOS object URL to export information. These individual export information objects will exist on RADOS, and are not affected by this bug.

Step 3: Once the stack update procedure is complete, ensure that the ganesha-export-index is recreated:

   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data put ganesha-export-index export_index_backup.txt

Step 4: Verify that the object exists, and its contents match:

   # podman exec ceph-nfs-pacemaker rados -n client.manila -p manila_data get ganesha-export-index -

Root Cause

Director’s ceph-nfs ansible role erroneously overwrites an export metadata file that is necessary for the recovery operations within the Ceph-NFS (NFS-Ganesha) server. This ansible role is invoked with the “openstack overcloud deploy” command.

Diagnostic Steps

When OpenStack Manila CephFS-via-NFS shares are mounted to client workloads, they can experience an unexpected outage. The outage can be observed at any time after the overcloud has been updated.In cases where the NFS share is “hard” mounted, running client applications can seem to pause, hang or crash with a timeout while waiting to reconnect to the NFS server.

SBR

Product(s)

Red Hat OpenStack Platform

Category

Install

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.