MON recovery procedure for RHCS containerized deployment when all the three mon are down.
Prerequisites
-
All OSD daemon service should be stopped
# systemctl stop ceph-osd@id -
Monitor(MON) nodes should have ceph-mon package installed
# yum install ceph-mon -
OSD nodes should have ceph-osd package
# yum install ceph-osd
Procedure
Please follow the steps sequentially.
Perform the steps on all the OSD nodes
-
Mount the data partitions to a temporary location
# for i in OSD_ID; do mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-$i; doneReplace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.
Example:
# for i in 0 3 6; do mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-$i; done -
Restore the SELinux context
# for i in OSD_ID; do restorecon /var/lib/ceph/osd/ceph-$i; doneReplace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.
Example:
# for i in 0 3 6; do restorecon /var/lib/ceph/osd/ceph-$i; done -
Change the owner and group to ceph:ceph:
# for i in OSD_ID; do chown -R ceph:ceph /var/lib/ceph/osd/ceph-$i; doneReplace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.
Example:
# for i in 0 3 6; do chown -R ceph:ceph /var/lib/ceph/osd/ceph-$i; done
Perform the steps on respective OSD nodes for each OSD's to Mount OSD Logical volumes
-
Mount the OSD devices
# ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/<vg_for_osd>/<lv-for-osd> --path /var/lib/ceph/osd/ceph-<osd_id>Example:
# ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/vg0/lv0 --path /var/lib/ceph/osd/ceph-2Mapping between OSD id and block device can be extracted by executing
# ceph-volumen lvm listcommand
For example:# ceph-volume lvm list ====== osd.2 ======= <---OSD with ID [block] /dev/vg0/lv0 <-- Corresponding block device block device /dev/vg0/lv0 block uuid tMf1tp-C84F-sBDu-SwK8-p2Zc-xVy0-gc7riN cephx lockbox secret cluster fsid a1b81e41-4abe-40c3-baa7-0f647e58d5b2 cluster name ceph crush device class None encrypted 0 osd fsid 0670359f-5633-4582-b1e4-49fc2d5c2f91 osd id 2 osdspec affinity type block vdo 0 devices /dev/vdb devices /dev/vdd -
Create a link to the mount path as given below
# ln -snf /dev/<vg_for_osd>/<lv-for-osd> /var/lib/ceph/osd/ceph-<osd_id>/blockExample:
# ln -snf /dev/vg0/lv0 /var/lib/ceph/osd/ceph-2/block -
Change the user and group for the block directory and mount path
# chown -h ceph:ceph /var/lib/ceph/osd/ceph-<osd_id>/block # chown -h ceph:ceph /var/lib/ceph/osd/ceph-<osd_id>Example:
# chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block # chown -h ceph:ceph /var/lib/ceph/osd/ceph-2 -
Change the user and group for dm device corresponding to the OSD device
# chown -R ceph:ceph /dev/dm-<name>
Perform the steps on a MON node
-
Generate the SSH key pair with the default file name and no passphrase
# ssh-keygen -
Copy the public key to all OSD nodes
# ssh-copy-id root@<FQDN for OSD node> -
Collect the cluster map from all OSD nodes
In order to collect the cluster map copy the following script and executevi recover.sh
## -------------------------------------------------------------------------- ## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end ## with a trailing / otherwise rsync will not operate properly. ## -------------------------------------------------------------------------- ms=/tmp/monstore/ db=/root/db/ db_slow=/root/db.slow/ mkdir -p $ms $db $db_slow ## -------------------------------------------------------------------------- ## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with ## the list of OSD nodes in the environment. ## -------------------------------------------------------------------------- osd_nodes="osdnode1 osdnode2 osdnode3..." for osd_node in $osd_nodes; do echo "Operating on $osd_node" rsync -avz --delete $ms $osd_node:$ms rsync -avz --delete $db $osd_node:$db rsync -avz --delete $db_slow $osd_node:$db_slow ssh -t $osd_node <<EOF for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms if [ -e \$osd/keyring ]; then cat \$osd/keyring >> $ms/keyring echo ' caps mgr = "allow profile osd"' >> $ms/keyring echo ' caps mon = "allow profile osd"' >> $ms/keyring echo ' caps osd = "allow *"' >> $ms/keyring else echo WARNING: \$osd on $osd_node does not have a local keyring. fi done EOF rsync -avz --delete --remove-source-files $osd_node:$ms $ms rsync -avz --delete --remove-source-files $osd_node:$db $db rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow done ## -------------------------------------------------------------------------- ## End of script ## -------------------------------------------------------------------------- -
Create a file with all keyrings
- MON keyring path :
cat /var/lib/ceph/mon/ceph-<mon node>/keyring - Client Keyring path from client nodes :
cat /etc/ceph/ceph.client.admin.keyring - OSD Keyring generated by the above script:
/tmp/monstore/keyring - MGR keyring path from mgr nodes :
cat /var/lib/ceph/mgr/ceph-,mgr Node>/keyring - MDS keyring path from mds nodes:
cat /var/lib/ceph/mds/ceph-<mds node>/keyring
Note: For this keyring append the following caps if not exist
caps mds = "allow"
caps mon = "allow profile mds"
caps osd = "allow *" - RGW keyring path from rgw nodes :
cat /var/lib/ceph/radosgw/ceph-<rgw node>/keyring
Note: For this keyring append the following caps if not exist
caps mon = "allow rw"
caps osd = "allow *"
- MON keyring path :
-
Check for the monmap
# ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap -
# monmaptool /tmp/monmap --printNotice that the `No such file or directory error message if monmap is missed
For Example:
# monmaptool /tmp/monmap --print monmaptool: monmap file /tmp/monmap monmaptool: couldn't open /tmp/monmap: (2) No such file or directory # -
Rebuild the MON map
# monmaptool --create --addv <mon-id> <mon-a-ip> --enable-all-features --clobber /root/monmap.mon-a --fsid <fsid>Note: mon-id,mon-a-ip and fsid details can be fetched from the /etc/ceph/ceph.conf
For Example:
# cat /etc/ceph/ceph.conf [global] cluster network = 10.0.208.0/22 fsid = 345ecf3f-1494-4b35-80cb-1df54355362b mon host = [v2:10.0.210.146:3300,v1:10.0.210.146:6789],[v2:10.0.209.3:3300,v1:10.0.209.3:6789],[v2:10.0.208.15:3300,v1:10.0.208.15:6789] mon initial members = ceph-bharath-1623839999591-node1-mon-mgr-installer,ceph-bharath-1623839999591-node2-mon,ceph-bharath-1623839999591-node3-mon-osd# monmaptool --create --addv ceph-bharath-1623839999591-node2-mon [v2:10.0.209.3:3300,v1:10.0.209.3:6789] --addv ceph-bharath-1623839999591-node1-mon-mgr-installer [v2:10.0.210.146:3300,v1:10.0.210.146:6789] --addv ceph-bharath-1623839999591-node3-mon-osd [v2:10.0.208.15:3300,v1:10.0.208.15:6789] --enable-all-features --clobber /root/monmap.mon-a --fsid 345ecf3f-1494-4b35-80cb-1df54355362b -
check the generated monmap
# monmaptool /root/monmap.mon-a --print -
Rebuild the Monitor store from the collected map
# ceph-monstore-tool /tmp/monstore rebuild -- --keyring <Path of keyring file which created above> --monmap /root/monmap.mon-aNote: Provide the keyring path which is created above
-
Change the ownership of monstore directory to ceph
# chown -R ceph:ceph /tmp/monstore
Perform the steps on all the MON nodes
-
Back up the corrupted store
# mv /var/lib/ceph/mon/ceph-HOSTNAME/store.db /var/lib/ceph/mon/ceph-HOSTNAME/store.db.corrupted -
Replace the corrupted
store.db# scp -r /tmp/monstore/store.db <FQDN for MON node>:/var/lib/ceph/mon/ceph-<mon node>/
Perform the steps on all OSD nodes
-
unmount all the temporary mounted OSDs on all nodes
# umount /var/lib/ceph/osd/ceph-* -
Start all OSD's
# systemctl start ceph-osd@OSD-ID
Perform the step on all MON nodes
-
Start all the MON's
# systemctl start ceph-mon *