Replacing Openshift Container Storage node.

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform
    • 3.11
  • Red Hat Openshift Container Storage
    • 3.11

Issue

  • How to replace Openshift Storage node.
  • How to replace OCS host.

Resolution

To replace a node in a bare-metal, converged OCS/OCP deployment, follow the high-level steps below:

  • Step 1. Prepare the replacement node for OCP/OCS installation. Follow the steps mentioned in the docs to prepare the host to add to the cluster.

  • Step 2. Increase cluster size by adding the replacement node.

    • Follow the steps to add the replacement node to the OCP cluster.
    • Follow the steps to add the replacement node to OCS/Heketi.
      NOTE: Follow the above guideline until (including) step 1.1.2.2. “Using Heketi CLI”. Make sure to add the appropriate storage label to the node and make sure you have storage devices attached to that node.
  • Step 3. Evacuate and delete the node to be decommissioned from OCS.

    • Follow below guideline to delete the disk devices from old node.
      NOTE: This step may take a long time to complete since it will replace all the bricks from the old node to the replacement node.
    • Once the above process completes, This content is not included.delete the node from the cluster configuration.
  • Step 4. Delete and uninstall the old node from OCP.

    • Remove the storage labels from the old node:
    • Follow the steps to delete and uninstall the old node from OCP.

Diagnostic Steps

  • Replace node node86.ocp.example.com by (fresh RHEL installed) node node79.coe.muc.redhat.com.
  • Node to be replaced: node86.ocp.example.com (“old node”) . Check the current informaiton about the node.
[root@node80 ~]# oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
node80.ocp.example.com   Ready     master    68d       v1.11.0+d4cacc0
node81.ocp.example.com   Ready     infra     68d       v1.11.0+d4cacc0
node82.ocp.example.com   Ready     infra     68d       v1.11.0+d4cacc0
node83.ocp.example.com   Ready     compute   68d       v1.11.0+d4cacc0
node84.ocp.example.com   Ready     compute   68d       v1.11.0+d4cacc0
node85.ocp.example.com   Ready     compute   68d       v1.11.0+d4cacc0
node86.ocp.example.com   Ready     compute   68d       v1.11.0+d4cacc0
node87.ocp.example.com   Ready     compute   68d       v1.11.0+d4cacc0

[root@node80 ~]# oc project app-storage
Now using project "app-storage" on server "https://node80.ocp.example.com:443".

[root@node80 ~]# oc get pods -o wide
NAME                                          READY     STATUS    RESTARTS   AGE       IP            NODE                           NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn   1/1       Running   0          63d       10.129.2.2    node84.ocp.example.com   <none>
glusterfs-storage-fz77p                       1/1       Running   4          68d       1.1.1.87   node87.ocp.example.com   <none>
glusterfs-storage-mk7kf                       1/1       Running   0          68d       1.1.1.86   node86.ocp.example.com   <none>
glusterfs-storage-qvmkt                       1/1       Running   4          68d       1.1.1.85   node85.ocp.example.com   <none>
glusterfs-storage-whwm6                       1/1       Running   1          68d       1.1.1.84   node84.ocp.example.com   <none>
heketi-storage-1-pm548                        1/1       Running   1          68d       10.129.0.5    node82.ocp.example.com   <none>

[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin topology info
. . . 
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
    State: online
    Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
    Zone: 1
    Management Hostnames: node86.ocp.example.com
    Storage Hostnames: 1.1.1.86
    Devices:
        Id:23d930554b78c9d25dbafb838c236f68   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99
            Bricks:
        Id:2a36fcad4ab642c5c534916609faff0f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99
            Bricks:
        Id:eb21e3dcfabb76d2513b5d4843b6bb18   Name:/dev/sdc            State:online    Size (GiB):99      Used (GiB):56      Free (GiB):43
            Bricks:
                Id:7fa1d85e6e5685e6d39b3cfe2e6ee6c9   Size (GiB):4       Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_7fa1d85e6e5685e6d39b3cfe2e6ee6c9/brick
                Id:a718ec9afe0696591c93056a209de4ee   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_a718ec9afe0696591c93056a209de4ee/brick
                Id:bdb4367d41003d4e350bc678175dbad9   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_bdb4367d41003d4e350bc678175dbad9/brick
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18   Name:/dev/sdc            State:online    Size (GiB):99      Used (GiB):56      Free (GiB):43      Bricks:3

[root@node86 ~]# lsblk --nodeps
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda    8:0    0   32G  0 disk
sdb    8:16   0  100G  0 disk
sdc    8:32   0  100G  0 disk
sdd    8:48   0  100G  0 disk
sde    8:64   0  100G  0 disk
sr0   11:0    1 1024M  0 rom

[root@node86 ~]# cat /etc/sysconfig/docker-storage-setup
DEVS=/dev/sdb
VG=docker-vg
  • Replacement node: node79.ocp.example.com (“new node”)
    • IP: 1.1.1.79
    • /dev/sdc. /dev/sdd, /dev/sde for OCS use (100GB each)
[root@node79 ~]# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                    8:0    0   32G  0 disk
|-sda1                 8:1    0    1G  0 part /boot
`-sda2                 8:2    0   31G  0 part
  |-rhel_dhcp54-root 253:0    0   30G  0 lvm  /
  `-rhel_dhcp54-swap 253:1    0 1020M  0 lvm  [SWAP]
sdb                    8:16   0  100G  0 disk
sdc                    8:32   0  100G  0 disk
sdd                    8:48   0  100G  0 disk
sde                    8:64   0  100G  0 disk
sr0                   11:0    1 1024M  0 rom
sr1                   11:1    1  374K  0 rom

Step 1. Prepare the Replacement Node for OCP/OCS Installation
NOTE: Not all steps are shown below!

[root@node79 ~]# hostnamectl set-hostname node79.ocp.example.com
[root@node79 ~]# hostnamectl
   Static hostname: node79.ocp.example.com
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 67d2d242ad0c4ea996fc85d37ab78463
           Boot ID: ba44d86f3d804ee490e391b689336255
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
            Kernel: Linux 3.10.0-862.3.2.el7.x86_64
      Architecture: x86-64

[root@node79 ~]# subscription-manager register
Registering to: subscription.rhsm.redhat.com:443/subscription
Username: 
Password:
The system has been registered with ID: f887acab-a7cd-4175-bce7-0357c330b3ea
The registered system name is: node79.ocp.example.com 

[root@node79 ~]# subscription-manager attach --pool=<pool_id>
Successfully attached a subscription for: Employee SKU
1 local certificate has been deleted.

[root@node79 ~]# subscription-manager repos --disable \*
[root@node79 ~]# subscription-manager repos --enable rhel-7-server-extras-rpms --enable rhel-7-server-ansible-2.6-rpms --enable rhel-7-server-ose-3.11-rpms --enable rh-gluster-3-client-for-rhel-7-server-rpms
. . .
[root@node79 ~]# yum repolist
Failed to set locale, defaulting to C
Loaded plugins: product-id, search-disabled-repos, subscription-manager
repo id                                                                                          repo name                                                                                                     status
rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64                                        Red Hat Storage Native Client for RHEL 7 (RPMs)                                                                 228
rhel-7-server-ansible-2.6-rpms/x86_64                                                            Red Hat Ansible Engine 2.6 RPMs for Red Hat Enterprise Linux 7 Server                                            17
rhel-7-server-extras-rpms/x86_64                                                                 Red Hat Enterprise Linux 7 Server - Extras (RPMs)                                                              1019
rhel-7-server-ose-3.11-rpms/x86_64                                                               Red Hat OpenShift Container Platform 3.11 (RPMs)                                                                348
rhel-7-server-rpms/7Server/x86_64                                                                Red Hat Enterprise Linux 7 Server (RPMs)                                                                      23382
repolist: 24994

[root@node79 ~]# yum update -y
. . . 
[root@node79 ~]# systemctl reboot
…    
[root@node79 ~]# yum install wget git net-tools bind-utils yum-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
. . . 
[root@node79 ~]# yum install -y docker-1.13.1
. . . 
[root@node79 ~]# docker --version
Docker version 1.13.1, build 07f3374/1.13.1
  • Configure Docker storage:
    NOTE: The steps below depend on how you configured storage for docker!
[root@node79 ~]# cat <<EOF > /etc/sysconfig/docker-storage-setup
> DEVS=/dev/sdb
> VG=docker-vg
> EOF
[root@node79 ~]# cat /etc/sysconfig/docker-storage-setup
DEVS=/dev/sdb
VG=docker-vg
[root@node79 ~]# docker-storage-setup
INFO: Writing zeros to first 4MB of device /dev/sdb
4+0 records in
4+0 records out
4194304 bytes (4.2 MB) copied, 0.0539972 s, 77.7 MB/s
INFO: Device node /dev/sdb1 exists.
  Physical volume "/dev/sdb1" successfully created.
  Volume group "docker-vg" successfully created
  Rounding up size to full physical extent 104.00 MiB
  Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
  Logical volume "docker-pool" created.
  Logical volume docker-vg/docker-pool changed.
[root@node79 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true "
[root@node79 ~]# vgs
  VG          #PV #LV #SN Attr   VSize    VFree
  docker-vg     1   1   0 wz--n- <100.00g 60.00g
  rhel_dhcp54   1   2   0 wz--n-  <31.00g     0

[root@node79 ~]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@node79 ~]# systemctl start docker
[root@node79 ~]# systemctl is-active docker
active

Step 2. Increase Cluster Size by Adding the Replacement Node.

  • Create an inventory file with replacement host infos (add host):
[root@node88 ansible]# diff inventory-file-311pu-1xcns_1m2i1a4s_v6.txt inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
5a6
> new_nodes
115a117,120
>
> # New nodes to be added
> [new_nodes]
> node79.ocp.example.com openshift_node_group_name="node-config-compute"

[root@node88 ansible]# cat inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
[OSEv3:children]
masters
etcd
nodes
glusterfs
new_nodes
. . .
# New nodes to be added
[new_nodes]
node79.ocp.example.com openshift_node_group_name="node-config-compute"

Check connectivity
[root@node88 ansible]# ansible -i ./inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt new_nodes -b -m ping
node79.ocp.example.com | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
  • Scale-up the OCP cluster:
[root@node88 ansible]# ansible-playbook -i /etc/ansible/inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt /usr/share/ansible/openshift-ansible/playbooks/openshift-node/scaleup.yml
. . . 
INSTALLER STATUS ****************************************************************************************************************************************************************************************************
Initialization              : Complete (0:03:50)
Node Bootstrap Preparation  : Complete (0:22:18)
Node Join                   : Complete (0:01:21)

[root@node80 ~]# oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
node79.ocp.example.com   Ready     compute   21m       v1.11.0+d4cacc0
node80.ocp.example.com   Ready     master    69d       v1.11.0+d4cacc0
node81.ocp.example.com   Ready     infra     69d       v1.11.0+d4cacc0
node82.ocp.example.com   Ready     infra     69d       v1.11.0+d4cacc0
node83.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node84.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node85.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node86.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node87.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0

[root@node80 ~]# oc label node/node79.ocp.example.com logging-infra-fluentd=true
node/node79.ocp.example.com labeled
  • Update the inventory file to reflect the new OCP cluster:
[root@node88 ansible]# diff inventory-file-311pu-1xcns_1m2i1a4s_v7.txt inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
110d109
< node79.ocp.example.com openshift_node_group_name="node-config-compute"
118d116
< node79.ocp.example.com glusterfs_devices='[ "/dev/sdc", "/dev/sdd", "/dev/sde" ]'
121a120
> node79.ocp.example.com openshift_node_group_name="node-config-compute"

[root@node88 ansible]# cat inventory-file-311pu-1xcns_1m2i1a4s_v7.txt
[OSEv3:children]
masters
etcd
nodes
glusterfs
new_nodes
. . .
[nodes]
node80.ocp.example.com openshift_node_group_name="node-config-master"
node81.ocp.example.com openshift_node_group_name="node-config-infra"
node82.ocp.example.com openshift_node_group_name="node-config-infra"
node83.ocp.example.com openshift_node_group_name="node-config-compute"
node84.ocp.example.com openshift_node_group_name="node-config-compute"
node85.ocp.example.com openshift_node_group_name="node-config-compute"
node86.ocp.example.com openshift_node_group_name="node-config-compute"
node87.ocp.example.com openshift_node_group_name="node-config-compute"
node79.ocp.example.com openshift_node_group_name="node-config-compute"

# OpenShift hosts with gluster pods
[glusterfs]
node84.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node85.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node86.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node87.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node79.ocp.example.com glusterfs_devices='[ "/dev/sdc", "/dev/sdd", "/dev/sde" ]'

# New nodes to be added
[new_nodes]
  • Open firewall ports for OCS:
[root@node79 sysconfig]# diff /etc/sysconfig/iptables /etc/sysconfig/iptables.bak
41,47d40
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24010 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 3260 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
[root@node79 sysconfig]# systemctl restart iptables
  • Label the replacement node:
[root@node80 ~]# oc project app-storage
Now using project "app-storage" on server "https://node80.ocp.example.com:443".
[root@node80 ~]# oc get ds
NAME                DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
glusterfs-storage   4         4         4         4            4           glusterfs=storage-host   69d

[root@node80 ~]# oc label node node79.ocp.example.com glusterfs=storage-host
node/node79.ocp.example.com labeled
[root@node80 ~]# oc get pods -o wide
NAME                                          READY     STATUS              RESTARTS   AGE       IP            NODE                           NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn   1/1       Running             0          63d       10.129.2.2    node84.ocp.example.com   <none>
glusterfs-storage-6s5wg                       0/1       ContainerCreating   0          6s        1.1.1.79   node79.ocp.example.com   <none>
glusterfs-storage-fz77p                       1/1       Running             4          69d       1.1.1.87   node87.ocp.example.com   <none>
glusterfs-storage-mk7kf                       1/1       Running             0          69d       1.1.1.86   node86.ocp.example.com   <none>
glusterfs-storage-qvmkt                       1/1       Running             4          69d       1.1.1.85   node85.ocp.example.com   <none>
glusterfs-storage-whwm6                       1/1       Running             1          69d       1.1.1.84   node84.ocp.example.com   <none>
heketi-storage-1-pm548                        1/1       Running             1          69d       10.129.0.5    node82.ocp.example.com   <none>
  • After some minutes, new glusterfs-storage pod should be healthy:
[root@node80 ~]# oc get pods
NAME                                          READY     STATUS    RESTARTS   AGE
glusterblock-storage-provisioner-dc-1-cfcxn   1/1       Running   0          63d
glusterfs-storage-6s5wg                       1/1       Running   0          1m
glusterfs-storage-fz77p                       1/1       Running   4          69d
glusterfs-storage-mk7kf                       1/1       Running   0          69d
glusterfs-storage-qvmkt                       1/1       Running   4          69d
glusterfs-storage-whwm6                       1/1       Running   1          69d
heketi-storage-1-pm548                        1/1       Running   1          69d
  • Add the Replacement Node to OCS/Heketi:
    NOTE:Info about old node: Get heketi node id e.g. from output of # heketi-cli topology info.
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18   Name:/dev/sdc            State:online    Size (GiB):99      Used (GiB):56      Free (GiB):43      Bricks:3
  • Add replacement node to OCS cluster w/ heketi CLI:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node add --zone=1 --cluster=ffa6d8659829fa3492e5c2bb321f71b8 --management-host-name=node79.ocp.example.com --storage-host-name=1.1.1.79
Node information:
Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname node79.ocp.example.com
Storage Hostname 1.1.1.79

[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node list
Id:483ef5223e013b52b78a161e340183df    Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:68d2fc78df9cbad7f0c44080924802ec    Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:971bee9f2d3c7e4e875350d9996ff69b    Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:a32f3112cc659819433895d0d1a40e67    Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:f2f51fdaf3440c0f7e1aa5165deef042    Cluster:ffa6d8659829fa3492e5c2bb321f71b8

[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
  • Add disk devices on replacement node to heketi/OCS:
[root@node80 ~]# for i in /dev/sdc /dev/sdd /dev/sde; do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device add --name=${i} --node=68d2fc78df9cbad7f0c44080924802ec; done
/dev/sdc
Device added successfully
/dev/sdd
Device added successfully
/dev/sde
Device added successfully

[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
Id:5215babf19e78cb2f30bd7f649b426fa   Name:/dev/sdc            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:a0ee466d2b83a82f0ba662ee836f49bb   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:adb582446ebbfa28fe97045f80e7d80f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
  • Device info on old node:
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device info $i; done
23d930554b78c9d25dbafb838c236f68
Device Id: 23d930554b78c9d25dbafb838c236f68
Name: /dev/sdd
State: online
Size (GiB): 99
Used (GiB): 0
Free (GiB): 99
Bricks:
2a36fcad4ab642c5c534916609faff0f
Device Id: 2a36fcad4ab642c5c534916609faff0f
Name: /dev/sde
State: online
Size (GiB): 99
Used (GiB): 0
Free (GiB): 99
Bricks:
eb21e3dcfabb76d2513b5d4843b6bb18
Device Id: eb21e3dcfabb76d2513b5d4843b6bb18
Name: /dev/sdc
State: online
Size (GiB): 99
Used (GiB): 56
Free (GiB): 43
Bricks:
Id:7fa1d85e6e5685e6d39b3cfe2e6ee6c9   Size (GiB):4       Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_7fa1d85e6e5685e6d39b3cfe2e6ee6c9/brick
Id:a718ec9afe0696591c93056a209de4ee   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_a718ec9afe0696591c93056a209de4ee/brick
Id:bdb4367d41003d4e350bc678175dbad9   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_bdb4367d41003d4e350bc678175dbad9/brick
  • Remove disk devices from the old node in heketi:
    NOTE: This step can take a long time as bricks will be evacuated to other nodes
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device disable $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 is now offline
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f is now offline
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 is now offline

[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device remove $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 is now removed
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f is now removed
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 is now removed
  • Verify that bricks have been evacuated from old node:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68   Name:/dev/sdd            State:failed    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f   Name:/dev/sde            State:failed    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18   Name:/dev/sdc            State:failed    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
  • Delete disk devices from the old node in heketi:
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device delete $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 deleted
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f deleted
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 deleted

[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
  • Check that replacement node has bricks now:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
Id:5215babf19e78cb2f30bd7f649b426fa   Name:/dev/sdc            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
Id:a0ee466d2b83a82f0ba662ee836f49bb   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):54      Free (GiB):45      Bricks:2
Id:adb582446ebbfa28fe97045f80e7d80f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      Bricks:0
  • Delete the old node from heketi:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node disable f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 is now offline
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node remove f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 is now removed
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node delete f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 deleted

Step 3. Delete and Uninstall the Old Node from the OCP Cluster

  • Remove the storage label from the old node in OCP:
[root@node80 ~]# oc label node node86.ocp.example.com glusterfs-
node/node86.ocp.example.com labeled

[root@node80 ~]# oc get pods -o wide
NAME                                          READY     STATUS        RESTARTS   AGE       IP            NODE                           NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn   1/1       Running       0          63d       10.129.2.2    node84.ocp.example.com   <none>
glusterfs-storage-4vfr4                       1/1       Running       0          2m        1.1.1.79   node79.ocp.example.com   <none>
glusterfs-storage-fz77p                       1/1       Running       4          69d       1.1.1.87   node87.ocp.example.com   <none>
glusterfs-storage-mk7kf                       1/1       Terminating   0          69d       1.1.1.86   node86.ocp.example.com   <none>
glusterfs-storage-qvmkt                       1/1       Running       4          69d       1.1.1.85   node85.ocp.example.com   <none>
glusterfs-storage-whwm6                       1/1       Running       1          69d       1.1.1.84   node84.ocp.example.com   <none>
heketi-storage-1-pm548                        1/1       Running       1          69d       10.129.0.5    node82.ocp.example.com   <none>
  • After some minutes, no more glusterfs pods on old node:
[root@node80 ~]# oc get pods -o wide
NAME                                          READY     STATUS    RESTARTS   AGE       IP            NODE                           NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn   1/1       Running   0          64d       10.129.2.2    node84.ocp.example.com   <none>
glusterfs-storage-4vfr4                       1/1       Running   0          1h        1.1.1.79   node79.ocp.example.com   <none>
glusterfs-storage-fz77p                       1/1       Running   4          69d       1.1.1.87   node87.ocp.example.com   <none>
glusterfs-storage-qvmkt                       1/1       Running   4          69d       1.1.1.85   node85.ocp.example.com   <none>
glusterfs-storage-whwm6                       1/1       Running   1          69d       1.1.1.84   node84.ocp.example.com   <none>
heketi-storage-1-pm548                        1/1       Running   1          69d       10.129.0.5    node82.ocp.example.com   <none>
  • Delete old node’s object from the OCP cluster:
[root@node80 ~]# oc get pods -o wide --all-namespaces | grep node86
openshift-logging                   logging-fluentd-ltlsz                             1/1       Running   0          14d       10.131.2.2    node86.ocp.example.com   <none>
openshift-monitoring                node-exporter-t6vrf                               2/2       Running   0          14d       1.1.1.86   node86.ocp.example.com   <none>
openshift-node                      sync-bmktg                                        1/1       Running   1          69d       1.1.1.86   node86.ocp.example.com   <none>
openshift-sdn                       ovs-9z8c6                                         1/1       Running   1          69d       1.1.1.86   node86.ocp.example.com   <none>
openshift-sdn                       sdn-qbggs                                         1/1       Running   1          69d       1.1.1.86   node86.ocp.example.com   <none>

[root@node80 ~]# oc adm drain node86.ocp.example.com
node/node86.ocp.example.com cordoned
error: unable to drain node "node86.ocp.example.com", aborting command...

There are pending nodes to be drained:
 node86.ocp.example.com
error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): logging-fluentd-ltlsz, node-exporter-t6vrf, sync-bmktg, ovs-9z8c6, sdn-qbggs

[root@node80 ~]# oc delete node node86.ocp.example.com
node "node86.ocp.example.com" deleted

[root@node80 ~]# oc get nodes
NAME                           STATUS    ROLES     AGE       VERSION
node79.ocp.example.com   Ready     compute   3h        v1.11.0+d4cacc0
node80.ocp.example.com   Ready     master    69d       v1.11.0+d4cacc0
node81.ocp.example.com   Ready     infra     69d       v1.11.0+d4cacc0
node82.ocp.example.com   Ready     infra     69d       v1.11.0+d4cacc0
node83.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node84.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node85.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0
node87.ocp.example.com   Ready     compute   69d       v1.11.0+d4cacc0

Step 4. Uninstall old node from OCP cluster:

  • Create a dedicated inventory file for node deletion:
[root@node88 ~]# cat /etc/ansible/inventory-file-del_node.txt
[OSEv3:children]
nodes

[OSEv3:vars]
debug_level=2
ansible_user=root

[nodes]
node86.ocp.example.com openshift_node_group_name="node-config-compute"
  • Run the uninstall playbook:
[root@node88 ~]# ansible-playbook -i /etc/ansible/inventory-file-del_node.txt /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml
. . .

PLAY RECAP **********************************************************************************************************************************************************************************************************
node86.ocp.example.com : ok=37   changed=11   unreachable=0    failed=0
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.