Upgrading RHCS 5 hosts from RHEL 8 to RHEL 9 removes ceph-common package. Services fail to start.

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage (RHCS) 5
  • Red Hat Ceph Storage (RHCS) 6
  • Red Hat Ceph Storage (RHCS) 7
  • Red Hat Enterprise Linux (RHEL) 8
  • Red Hat Enterprise Linux (RHEL) 9

Issue

  • Ceph services fail to start automatically after rebooting ceph nodes upgraded with leapp from RHEL 8 to RHEL 9.
  • After the upgrade the /etc/ceph directory is missing and keys need to be regenerated.

Resolution

To resolve the issue of ceph services not starting after the upgrade one of the following options can be taken:

  • Wait until the manager checks the host (happens periodically, normally every 10 minutes) for new disk drives, when this ceph orchestration task is run and does not find a /var/log/ceph/<fsid> directory then it will be re-created with the correct permissions.

  • Create the directory manually with the correct permissions set. The <fsid> can be found out executing sudo ceph fsid on the management node.

      # sudo mkdir -p /var/log/ceph/<fsid>
      # sudo chmod 3770 /var/log/ceph
      # sudo chmod 0770 /var/log/ceph/<fsid>
      # sudo chown -R ceph:ceph /var/log/ceph
    

After the directory is in place, install the ceph-common package manually.

If the services still have problems starting you might need to reset the systemd failed counter first using systemctl reset-failed <service-name> and then try to start it again.

Installing the ceph-common package only will not help. The /var/log/ceph directory is created, but the fs-id directory inside it will still be missing.

To avoid this issue before the upgrade we need to configure LEAPP to not remove libunwind:

# echo libunwind | sudo tee -a /etc/leapp/transaction/to_keep

NOTE: Despite adding libunwind to the to_keep file in the step above, preupgrade will still report that libunwind will be removed during upgrade. However, if it is included in the to_keep file it will not in fact be removed.

Root Cause

The problem occurs because the LEAPP upgrade process removes the libunwind package, which is slated for removal from RHEL 9. The ceph-common package depends on libunwind, therefore it will be uninstalled as well. When the ceph-common package is removed it removes the /var/log/ceph directory. When podman tries to start the ceph containers, it can not mount /var/log/ceph/<fsid> into the container and fails with an error.

LEAPP can remove the ceph-common package also when the red hat ceph tools repository has not been enabled as a custom repository when running the LEAPP command. The documentation has been updated to reflect that you need to enable this repository.

Artifacts

Product/VersionRelated BZ/JiraErrataFixed Version
RHCS/7Bugzilla This content is not included.2263195Errata TBD7.1z4 - 7.1.4
RHEL/8-9Jira This content is not included.RHEL-34526Errata TBDTBD
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.