MON recovery procedure for RHCS containerized deployment when all the three mon are down.

Updated

Prerequisites

  1. All OSD daemon service should be stopped

       # systemctl stop ceph-osd@id
    
  2. Monitor(MON) nodes should have ceph-mon package installed

       # yum install ceph-mon
    
  3. OSD nodes should have ceph-osd package

       # yum install ceph-osd
    

Procedure

Please follow the steps sequentially.

Perform the steps on all the OSD nodes

  • Mount the data partitions to a temporary location

       # for i in OSD_ID; do mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-$i; done
    

    Replace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.

    Example:

       # for i in 0 3 6; do mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-$i; done
    
  • Restore the SELinux context

       # for i in OSD_ID; do restorecon /var/lib/ceph/osd/ceph-$i; done
    

    Replace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.

    Example:

       # for i in 0 3 6; do restorecon /var/lib/ceph/osd/ceph-$i; done
    
  • Change the owner and group to ceph:ceph:

       # for i in OSD_ID; do chown -R ceph:ceph /var/lib/ceph/osd/ceph-$i; done
    

    Replace OSD_ID with a numeric, space-separated list of Ceph OSD IDs on the OSD node.

    Example:

       # for i in 0 3 6; do chown -R ceph:ceph /var/lib/ceph/osd/ceph-$i; done
    

Perform the steps on respective OSD nodes for each OSD's to Mount OSD Logical volumes

  • Mount the OSD devices

       # ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/<vg_for_osd>/<lv-for-osd> --path /var/lib/ceph/osd/ceph-<osd_id>
    

    Example:

       # ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/vg0/lv0  --path /var/lib/ceph/osd/ceph-2
    

    Mapping between OSD id and block device can be extracted by executing # ceph-volumen lvm list command
    For example:

          # ceph-volume lvm list
          ====== osd.2 =======    <---OSD with ID
          
            [block]       /dev/vg0/lv0 <-- Corresponding block device
    
                block device              /dev/vg0/lv0
                block uuid                tMf1tp-C84F-sBDu-SwK8-p2Zc-xVy0-gc7riN
                cephx lockbox secret      
                cluster fsid              a1b81e41-4abe-40c3-baa7-0f647e58d5b2
                cluster name              ceph
                crush device class        None
                encrypted                 0
                osd fsid                  0670359f-5633-4582-b1e4-49fc2d5c2f91
                osd id                    2
                osdspec affinity          
                type                      block
                vdo                       0
                devices                   /dev/vdb
          
                devices                   /dev/vdd
    
  • Create a link to the mount path as given below

       # ln -snf /dev/<vg_for_osd>/<lv-for-osd> /var/lib/ceph/osd/ceph-<osd_id>/block
    

    Example:

       # ln -snf /dev/vg0/lv0 /var/lib/ceph/osd/ceph-2/block
    
  • Change the user and group for the block directory and mount path

       # chown -h ceph:ceph /var/lib/ceph/osd/ceph-<osd_id>/block
       # chown -h ceph:ceph /var/lib/ceph/osd/ceph-<osd_id>
    

    Example:

       # chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
       # chown -h ceph:ceph /var/lib/ceph/osd/ceph-2
    
  • Change the user and group for dm device corresponding to the OSD device

       # chown -R ceph:ceph /dev/dm-<name>
    

Perform the steps on a MON node

  • Generate the SSH key pair with the default file name and no passphrase

       # ssh-keygen
    
  • Copy the public key to all OSD nodes

       # ssh-copy-id  root@<FQDN for OSD node>
    
  • Collect the cluster map from all OSD nodes
    In order to collect the cluster map copy the following script and execute

    vi recover.sh

          ## --------------------------------------------------------------------------
          ## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
          ## with a trailing / otherwise rsync will not operate properly.
          ## --------------------------------------------------------------------------
          ms=/tmp/monstore/
          db=/root/db/
          db_slow=/root/db.slow/
          
          mkdir -p $ms $db $db_slow
          
          ## --------------------------------------------------------------------------
          ## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
          ## the list of OSD nodes in the environment.
          ## --------------------------------------------------------------------------
          osd_nodes="osdnode1 osdnode2 osdnode3..."
          
          for osd_node in $osd_nodes; do
          echo "Operating on $osd_node"
          rsync -avz --delete $ms $osd_node:$ms
          rsync -avz --delete $db $osd_node:$db
          rsync -avz --delete $db_slow $osd_node:$db_slow
          
          ssh -t $osd_node <<EOF
          for osd in /var/lib/ceph/osd/ceph-*; do
              ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
              if [ -e \$osd/keyring ]; then
                  cat \$osd/keyring >> $ms/keyring
                  echo '    caps mgr = "allow profile osd"' >> $ms/keyring
                  echo '    caps mon = "allow profile osd"' >> $ms/keyring
                  echo '    caps osd = "allow *"' >> $ms/keyring
              else
                  echo WARNING: \$osd on $osd_node does not have a local keyring.
              fi
          done
          EOF
          
          rsync -avz --delete --remove-source-files $osd_node:$ms $ms
          rsync -avz --delete --remove-source-files $osd_node:$db $db
          rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
          done
          ## --------------------------------------------------------------------------
          ## End of script
          ## --------------------------------------------------------------------------
    
    
  • Create a file with all keyrings

    • MON keyring path : cat /var/lib/ceph/mon/ceph-<mon node>/keyring
    • Client Keyring path from client nodes : cat /etc/ceph/ceph.client.admin.keyring
    • OSD Keyring generated by the above script: /tmp/monstore/keyring
    • MGR keyring path from mgr nodes : cat /var/lib/ceph/mgr/ceph-,mgr Node>/keyring
    • MDS keyring path from mds nodes: cat /var/lib/ceph/mds/ceph-<mds node>/keyring
      Note: For this keyring append the following caps if not exist
      caps mds = "allow"
      caps mon = "allow profile mds"
      caps osd = "allow *"
    • RGW keyring path from rgw nodes : cat /var/lib/ceph/radosgw/ceph-<rgw node>/keyring
      Note: For this keyring append the following caps if not exist
      caps mon = "allow rw"
      caps osd = "allow *"
  • Check for the monmap

       # ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
    

  •    # monmaptool /tmp/monmap --print
    

    Notice that the `No such file or directory error message if monmap is missed

    For Example:

          # monmaptool /tmp/monmap --print
    
          monmaptool: monmap file /tmp/monmap
          monmaptool: couldn't open /tmp/monmap: (2) No such file or directory
          #
    
  • Rebuild the MON map

       # monmaptool --create --addv  <mon-id> <mon-a-ip> --enable-all-features --clobber /root/monmap.mon-a --fsid <fsid>
    

    Note: mon-id,mon-a-ip and fsid details can be fetched from the /etc/ceph/ceph.conf

    For Example:

          # cat /etc/ceph/ceph.conf 
    
          [global]
          cluster network = 10.0.208.0/22
          fsid = 345ecf3f-1494-4b35-80cb-1df54355362b
          mon host = [v2:10.0.210.146:3300,v1:10.0.210.146:6789],[v2:10.0.209.3:3300,v1:10.0.209.3:6789],[v2:10.0.208.15:3300,v1:10.0.208.15:6789]
          mon initial members = ceph-bharath-1623839999591-node1-mon-mgr-installer,ceph-bharath-1623839999591-node2-mon,ceph-bharath-1623839999591-node3-mon-osd
    
       # monmaptool --create --addv ceph-bharath-1623839999591-node2-mon  [v2:10.0.209.3:3300,v1:10.0.209.3:6789]  --addv ceph-bharath-1623839999591-node1-mon-mgr-installer [v2:10.0.210.146:3300,v1:10.0.210.146:6789] --addv ceph-bharath-1623839999591-node3-mon-osd  [v2:10.0.208.15:3300,v1:10.0.208.15:6789] --enable-all-features  --clobber /root/monmap.mon-a  --fsid 345ecf3f-1494-4b35-80cb-1df54355362b
    
  • check the generated monmap

       # monmaptool /root/monmap.mon-a --print
    
  • Rebuild the Monitor store from the collected map

       # ceph-monstore-tool /tmp/monstore rebuild -- --keyring <Path of keyring file which created above>  --monmap /root/monmap.mon-a
    

    Note: Provide the keyring path which is created above

  • Change the ownership of monstore directory to ceph

       # chown -R ceph:ceph /tmp/monstore
    

Perform the steps on all the MON nodes

  • Back up the corrupted store

       # mv /var/lib/ceph/mon/ceph-HOSTNAME/store.db /var/lib/ceph/mon/ceph-HOSTNAME/store.db.corrupted
    
  • Replace the corrupted store.db

       # scp -r /tmp/monstore/store.db <FQDN for MON node>:/var/lib/ceph/mon/ceph-<mon node>/
    

Perform the steps on all OSD nodes

  • unmount all the temporary mounted OSDs on all nodes

       # umount /var/lib/ceph/osd/ceph-*
    
  • Start all OSD's

       # systemctl start ceph-osd@OSD-ID
    

Perform the step on all MON nodes

  • Start all the MON's

       # systemctl start ceph-mon *
    
SBR
Category
Article Type