[ceph-ansible]RHCS 4.3 installation fails while executing the command "ceph mgr dump"

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 4.3.
  • podman-4.2.

Issue

  • Ansible playbook fails after retrying the task wait for all mgr to be up during RHCS 4.3 installation.
  • Ansible task wait for all mgr to be up fails while capturing ceph mgr dump.
  • RHCS 4.3 installation fails while executing the command ceph mgr dump.

Resolution

  • Red Hat Engineering team already aware of this issue and which is tracked in This content is not included.Bug 2162781
    • As a workaround, set the proper SELinux context for all mgr directories.
      After setting the permission, ceph-mgr will automatically start on the corresponding node.
      Apply this workaround on all mgr nodes.
      For example:

      # chcon  system_u:object_r:container_file_t:s0 -R /var/lib/ceph/mgr/ceph-$(hostname -s) 
      
    • Then re-run ansible playbook to continue the deployment.

Root Cause

  • While using the updated version of podman(podman-4.2), the wrong SELinux context - system_u:object_r:var_lib_t:s0 applied for the mgr keyring file.

Diagnostic Steps

  • Ansible playbook fails after retrying the task wait for all mgr to be up.

  • From ceph-ansible logs, TASK [ceph-mgr : wait for all mgr to be up] is getting failed.

    • For example:

            2023-09-01 10:57:56,972 p=65792 u=admin n=ansible | TASK [ceph-mgr : wait for all mgr to be up] **************************************************************************************************************************************************
            2023-09-01 10:57:56,972 p=65792 u=admin n=ansible | Friday 01 September 2023  10:57:56 -0400 (0:00:00.021)       0:10:30.894 ******
            2023-09-01 11:00:44,366 p=65792 u=admin n=ansible | fatal: [mon3 -> mon1]: FAILED! => changed=false
              attempts: 30
              cmd:
              - podman
              - exec
              - ceph-mon-mon1
              - ceph
              - --cluster
              - ceph
              - mgr
              - dump
              - -f
              - json
              delta: '0:00:00.416016'
              end: '2023-09-01 11:00:44.351865'
              rc: 0
              start: '2023-09-01 11:00:43.935849'
              stderr: ''
              stderr_lines: <omitted>
              stdout: |2-
            
                {"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":":/0","active_change":"0.000000","available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"]}}
      stdout_lines: <omitted>
      
  • ceph -s shows mgr: no daemons active.

    • For example:

      # podman exec -it ceph-mon-$(hostname -s) ceph -s | grep mgr
          mgr: no daemons active
      
  • The mgr containers are not starting:

      # podman ps | grep mgr
      # 
    
  • In all nodes, mgr services are not starting and the service shows activating / auto-restart state:.

    • For example:

      [root@mon2 ~]# systemctl --type=service | grep mgr
        ceph-mgr@mon2.service                                 loaded activating auto-restart Ceph Manager                                                      
      
  • Check the logs for ceph-mgr service.

    • For example:

      [root@mon2 ~]# journalctl -u ceph-mgr@$(hostname -s).service --no-pager | tail -n 15
      Sep 01 13:01:19 mon2 systemd[1]: ceph-mgr@mon2.service: Main process exited, code=exited, status=1/FAILURE
      Sep 01 13:01:19 mon2 systemd[1]: ceph-mgr@mon2.service: Failed with result 'exit-code'.
      Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Service RestartSec=10s expired, scheduling restart.
      Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Scheduled restart job, restart counter is at 1.
      Sep 01 13:01:30 mon2 systemd[1]: Stopped Ceph Manager.
      Sep 01 13:01:30 mon2 systemd[1]: Starting Ceph Manager...
      Sep 01 13:01:30 mon2 podman[139628]: Error: no container with name or ID "ceph-mgr-mon2" found: no such container
      Sep 01 13:01:30 mon2 podman[139638]: Error: no container with name or ID "ceph-mgr-mon2" found: no such container
      Sep 01 13:01:30 mon2 podman[139648]: 
      Sep 01 13:01:30 mon2 podman[139648]: 93b38edb31a97c4aee28f67443d7eec6cbe870af21c5e18a11168f7cc60b2686
      Sep 01 13:01:30 mon2 systemd[1]: Started Ceph Manager.
      Sep 01 13:01:30 mon2 ceph-mgr-mon2[139658]: find: '/var/lib/ceph/mgr/ceph-mon2/keyring': Permission denied                   
      Sep 01 13:01:30 mon2 ceph-mgr-mon2[139658]: chown: cannot access '/var/lib/ceph/mgr/ceph-mon2/keyring': Permission denied
      Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Main process exited, code=exited, status=1/FAILURE
      Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Failed with result 'exit-code'.
      
    • The above error indicates that while accessing the mgr keyring, its reporting the Permission denied error.

  • Check the permission of mgr keyring and the directory.

    • For example:

      [root@mon2 ~]# ls -lZd /var/lib/ceph/mgr/
      drwxr-xr-x. 3 167 167 system_u:object_r:container_file_t:s0 23 Sep  1 10:52 /var/lib/ceph/mgr/
      
      [root@mon2 ~]# ls -lZ /var/lib/ceph/mgr/*
      total 4
      -rw-------. 1 167 167 system_u:object_r:var_lib_t:s0 135 Sep  1 10:57 keyring      <<---
      
  • Apply the proper SELinux context to mgr keyring and check whether the service is getting started automatically or not.

    • For example:

      [root@mon2 ~]# chcon  system_u:object_r:container_file_t:s0 -R /var/lib/ceph/mgr/ceph-mon2/
      
      [root@mon2 ~]# podman ps | grep mgr
      1f131508fa4f  registry.redhat.io/rhceph/rhceph-4-rhel8:4-57              28 seconds ago  Up 28 seconds ago              ceph-mgr-mon2
      
      [root@mon2 ~]# ls -lZ /var/lib/ceph/mgr/ceph-mon2/
      total 4
      -rw-------. 1 167 167 system_u:object_r:container_file_t:s0 135 Sep  1 10:57 keyring
       
      [root@mon2 ~]# systemctl --type=service | grep mgr
        ceph-mgr@mon2.service                                 loaded active running Ceph Manager                                                      
      
SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.