[ceph-ansible]RHCS 4.3 installation fails while executing the command "ceph mgr dump"
Environment
- Red Hat Ceph Storage 4.3.
- podman-4.2.
Issue
- Ansible playbook fails after retrying the task
wait for all mgr to be upduringRHCS 4.3installation. - Ansible task
wait for all mgr to be upfails while capturingceph mgr dump. RHCS 4.3installation fails while executing the commandceph mgr dump.
Resolution
- Red Hat Engineering team already aware of this issue and which is tracked in This content is not included.Bug 2162781
-
As a workaround, set the proper
SELinux contextfor allmgrdirectories.
After setting the permission,ceph-mgrwill automatically start on the corresponding node.
Apply this workaround on allmgrnodes.
For example:# chcon system_u:object_r:container_file_t:s0 -R /var/lib/ceph/mgr/ceph-$(hostname -s) -
Then re-run
ansible playbookto continue the deployment.
-
Root Cause
- While using the updated version of
podman(podman-4.2), the wrong SELinux context -system_u:object_r:var_lib_t:s0applied for the mgr keyring file.
Diagnostic Steps
-
Ansible playbook fails after retrying the task
wait for all mgr to be up. -
From
ceph-ansiblelogs,TASK [ceph-mgr : wait for all mgr to be up]is getting failed.-
For example:
2023-09-01 10:57:56,972 p=65792 u=admin n=ansible | TASK [ceph-mgr : wait for all mgr to be up] ************************************************************************************************************************************************** 2023-09-01 10:57:56,972 p=65792 u=admin n=ansible | Friday 01 September 2023 10:57:56 -0400 (0:00:00.021) 0:10:30.894 ****** 2023-09-01 11:00:44,366 p=65792 u=admin n=ansible | fatal: [mon3 -> mon1]: FAILED! => changed=false attempts: 30 cmd: - podman - exec - ceph-mon-mon1 - ceph - --cluster - ceph - mgr - dump - -f - json delta: '0:00:00.416016' end: '2023-09-01 11:00:44.351865' rc: 0 start: '2023-09-01 11:00:43.935849' stderr: '' stderr_lines: <omitted> stdout: |2- {"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":":/0","active_change":"0.000000","available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"]}} stdout_lines: <omitted>
-
-
ceph -sshowsmgr: no daemons active.-
For example:
# podman exec -it ceph-mon-$(hostname -s) ceph -s | grep mgr mgr: no daemons active
-
-
The mgr containers are not starting:
# podman ps | grep mgr # -
In all nodes, mgr services are not starting and the service shows
activating/auto-restartstate:.-
For example:
[root@mon2 ~]# systemctl --type=service | grep mgr ceph-mgr@mon2.service loaded activating auto-restart Ceph Manager
-
-
Check the logs for
ceph-mgrservice.-
For example:
[root@mon2 ~]# journalctl -u ceph-mgr@$(hostname -s).service --no-pager | tail -n 15 Sep 01 13:01:19 mon2 systemd[1]: ceph-mgr@mon2.service: Main process exited, code=exited, status=1/FAILURE Sep 01 13:01:19 mon2 systemd[1]: ceph-mgr@mon2.service: Failed with result 'exit-code'. Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Service RestartSec=10s expired, scheduling restart. Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Scheduled restart job, restart counter is at 1. Sep 01 13:01:30 mon2 systemd[1]: Stopped Ceph Manager. Sep 01 13:01:30 mon2 systemd[1]: Starting Ceph Manager... Sep 01 13:01:30 mon2 podman[139628]: Error: no container with name or ID "ceph-mgr-mon2" found: no such container Sep 01 13:01:30 mon2 podman[139638]: Error: no container with name or ID "ceph-mgr-mon2" found: no such container Sep 01 13:01:30 mon2 podman[139648]: Sep 01 13:01:30 mon2 podman[139648]: 93b38edb31a97c4aee28f67443d7eec6cbe870af21c5e18a11168f7cc60b2686 Sep 01 13:01:30 mon2 systemd[1]: Started Ceph Manager. Sep 01 13:01:30 mon2 ceph-mgr-mon2[139658]: find: '/var/lib/ceph/mgr/ceph-mon2/keyring': Permission denied Sep 01 13:01:30 mon2 ceph-mgr-mon2[139658]: chown: cannot access '/var/lib/ceph/mgr/ceph-mon2/keyring': Permission denied Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Main process exited, code=exited, status=1/FAILURE Sep 01 13:01:30 mon2 systemd[1]: ceph-mgr@mon2.service: Failed with result 'exit-code'. -
The above error indicates that while accessing the
mgr keyring, its reporting thePermission deniederror.
-
-
Check the permission of
mgr keyringand the directory.-
For example:
[root@mon2 ~]# ls -lZd /var/lib/ceph/mgr/ drwxr-xr-x. 3 167 167 system_u:object_r:container_file_t:s0 23 Sep 1 10:52 /var/lib/ceph/mgr/ [root@mon2 ~]# ls -lZ /var/lib/ceph/mgr/* total 4 -rw-------. 1 167 167 system_u:object_r:var_lib_t:s0 135 Sep 1 10:57 keyring <<---
-
-
Apply the proper
SELinux contexttomgr keyringand check whether the service is getting started automatically or not.-
For example:
[root@mon2 ~]# chcon system_u:object_r:container_file_t:s0 -R /var/lib/ceph/mgr/ceph-mon2/ [root@mon2 ~]# podman ps | grep mgr 1f131508fa4f registry.redhat.io/rhceph/rhceph-4-rhel8:4-57 28 seconds ago Up 28 seconds ago ceph-mgr-mon2 [root@mon2 ~]# ls -lZ /var/lib/ceph/mgr/ceph-mon2/ total 4 -rw-------. 1 167 167 system_u:object_r:container_file_t:s0 135 Sep 1 10:57 keyring [root@mon2 ~]# systemctl --type=service | grep mgr ceph-mgr@mon2.service loaded active running Ceph Manager
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.