Replacing Openshift Container Storage node.
Environment
- Red Hat Openshift Container Platform
- 3.11
- Red Hat Openshift Container Storage
- 3.11
Issue
- How to replace
OpenshiftStorage node. - How to replace
OCShost.
Resolution
To replace a node in a bare-metal, converged OCS/OCP deployment, follow the high-level steps below:
-
Step 1. Prepare the replacement node for
OCP/OCSinstallation. Follow the steps mentioned in the docs to prepare the host to add to the cluster. -
Step 2. Increase cluster size by adding the replacement node.
- Follow the steps to add the replacement node to the
OCPcluster. - Follow the steps to add the replacement node to
OCS/Heketi.
NOTE: Follow the above guideline until (including) step 1.1.2.2. “Using Heketi CLI”. Make sure to add the appropriate storage label to the node and make sure you have storage devices attached to that node.
- Follow the steps to add the replacement node to the
-
Step 3. Evacuate and delete the node to be decommissioned from OCS.
- Follow below guideline to delete the disk devices from old node.
NOTE: This step may take a long time to complete since it will replace all the bricks from the old node to the replacement node. - Once the above process completes, This content is not included.delete the node from the cluster configuration.
- Follow below guideline to delete the disk devices from old node.
-
Step 4. Delete and uninstall the old node from OCP.
- Remove the storage labels from the old node:
- Follow the steps to delete and uninstall the old node from OCP.
Diagnostic Steps
- Replace node node86.ocp.example.com by (fresh RHEL installed) node node79.coe.muc.redhat.com.
- Node to be replaced: node86.ocp.example.com (“old node”) . Check the current informaiton about the node.
[root@node80 ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
node80.ocp.example.com Ready master 68d v1.11.0+d4cacc0
node81.ocp.example.com Ready infra 68d v1.11.0+d4cacc0
node82.ocp.example.com Ready infra 68d v1.11.0+d4cacc0
node83.ocp.example.com Ready compute 68d v1.11.0+d4cacc0
node84.ocp.example.com Ready compute 68d v1.11.0+d4cacc0
node85.ocp.example.com Ready compute 68d v1.11.0+d4cacc0
node86.ocp.example.com Ready compute 68d v1.11.0+d4cacc0
node87.ocp.example.com Ready compute 68d v1.11.0+d4cacc0
[root@node80 ~]# oc project app-storage
Now using project "app-storage" on server "https://node80.ocp.example.com:443".
[root@node80 ~]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn 1/1 Running 0 63d 10.129.2.2 node84.ocp.example.com <none>
glusterfs-storage-fz77p 1/1 Running 4 68d 1.1.1.87 node87.ocp.example.com <none>
glusterfs-storage-mk7kf 1/1 Running 0 68d 1.1.1.86 node86.ocp.example.com <none>
glusterfs-storage-qvmkt 1/1 Running 4 68d 1.1.1.85 node85.ocp.example.com <none>
glusterfs-storage-whwm6 1/1 Running 1 68d 1.1.1.84 node84.ocp.example.com <none>
heketi-storage-1-pm548 1/1 Running 1 68d 10.129.0.5 node82.ocp.example.com <none>
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin topology info
. . .
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostnames: node86.ocp.example.com
Storage Hostnames: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):0 Free (GiB):99
Bricks:
Id:2a36fcad4ab642c5c534916609faff0f Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99
Bricks:
Id:eb21e3dcfabb76d2513b5d4843b6bb18 Name:/dev/sdc State:online Size (GiB):99 Used (GiB):56 Free (GiB):43
Bricks:
Id:7fa1d85e6e5685e6d39b3cfe2e6ee6c9 Size (GiB):4 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_7fa1d85e6e5685e6d39b3cfe2e6ee6c9/brick
Id:a718ec9afe0696591c93056a209de4ee Size (GiB):2 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_a718ec9afe0696591c93056a209de4ee/brick
Id:bdb4367d41003d4e350bc678175dbad9 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_bdb4367d41003d4e350bc678175dbad9/brick
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18 Name:/dev/sdc State:online Size (GiB):99 Used (GiB):56 Free (GiB):43 Bricks:3
[root@node86 ~]# lsblk --nodeps
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
sdb 8:16 0 100G 0 disk
sdc 8:32 0 100G 0 disk
sdd 8:48 0 100G 0 disk
sde 8:64 0 100G 0 disk
sr0 11:0 1 1024M 0 rom
[root@node86 ~]# cat /etc/sysconfig/docker-storage-setup
DEVS=/dev/sdb
VG=docker-vg
- Replacement node: node79.ocp.example.com (“new node”)
- IP: 1.1.1.79
- /dev/sdc. /dev/sdd, /dev/sde for OCS use (100GB each)
[root@node79 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
|-sda1 8:1 0 1G 0 part /boot
`-sda2 8:2 0 31G 0 part
|-rhel_dhcp54-root 253:0 0 30G 0 lvm /
`-rhel_dhcp54-swap 253:1 0 1020M 0 lvm [SWAP]
sdb 8:16 0 100G 0 disk
sdc 8:32 0 100G 0 disk
sdd 8:48 0 100G 0 disk
sde 8:64 0 100G 0 disk
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 374K 0 rom
Step 1. Prepare the Replacement Node for OCP/OCS Installation
NOTE: Not all steps are shown below!
[root@node79 ~]# hostnamectl set-hostname node79.ocp.example.com
[root@node79 ~]# hostnamectl
Static hostname: node79.ocp.example.com
Icon name: computer-vm
Chassis: vm
Machine ID: 67d2d242ad0c4ea996fc85d37ab78463
Boot ID: ba44d86f3d804ee490e391b689336255
Virtualization: kvm
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
Kernel: Linux 3.10.0-862.3.2.el7.x86_64
Architecture: x86-64
[root@node79 ~]# subscription-manager register
Registering to: subscription.rhsm.redhat.com:443/subscription
Username:
Password:
The system has been registered with ID: f887acab-a7cd-4175-bce7-0357c330b3ea
The registered system name is: node79.ocp.example.com
[root@node79 ~]# subscription-manager attach --pool=<pool_id>
Successfully attached a subscription for: Employee SKU
1 local certificate has been deleted.
[root@node79 ~]# subscription-manager repos --disable \*
[root@node79 ~]# subscription-manager repos --enable rhel-7-server-extras-rpms --enable rhel-7-server-ansible-2.6-rpms --enable rhel-7-server-ose-3.11-rpms --enable rh-gluster-3-client-for-rhel-7-server-rpms
. . .
[root@node79 ~]# yum repolist
Failed to set locale, defaulting to C
Loaded plugins: product-id, search-disabled-repos, subscription-manager
repo id repo name status
rh-gluster-3-client-for-rhel-7-server-rpms/7Server/x86_64 Red Hat Storage Native Client for RHEL 7 (RPMs) 228
rhel-7-server-ansible-2.6-rpms/x86_64 Red Hat Ansible Engine 2.6 RPMs for Red Hat Enterprise Linux 7 Server 17
rhel-7-server-extras-rpms/x86_64 Red Hat Enterprise Linux 7 Server - Extras (RPMs) 1019
rhel-7-server-ose-3.11-rpms/x86_64 Red Hat OpenShift Container Platform 3.11 (RPMs) 348
rhel-7-server-rpms/7Server/x86_64 Red Hat Enterprise Linux 7 Server (RPMs) 23382
repolist: 24994
[root@node79 ~]# yum update -y
. . .
[root@node79 ~]# systemctl reboot
…
[root@node79 ~]# yum install wget git net-tools bind-utils yum-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
. . .
[root@node79 ~]# yum install -y docker-1.13.1
. . .
[root@node79 ~]# docker --version
Docker version 1.13.1, build 07f3374/1.13.1
- Configure Docker storage:
NOTE: The steps below depend on how you configured storage for docker!
[root@node79 ~]# cat <<EOF > /etc/sysconfig/docker-storage-setup
> DEVS=/dev/sdb
> VG=docker-vg
> EOF
[root@node79 ~]# cat /etc/sysconfig/docker-storage-setup
DEVS=/dev/sdb
VG=docker-vg
[root@node79 ~]# docker-storage-setup
INFO: Writing zeros to first 4MB of device /dev/sdb
4+0 records in
4+0 records out
4194304 bytes (4.2 MB) copied, 0.0539972 s, 77.7 MB/s
INFO: Device node /dev/sdb1 exists.
Physical volume "/dev/sdb1" successfully created.
Volume group "docker-vg" successfully created
Rounding up size to full physical extent 104.00 MiB
Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
Logical volume "docker-pool" created.
Logical volume docker-vg/docker-pool changed.
[root@node79 ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/docker--vg-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true "
[root@node79 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
docker-vg 1 1 0 wz--n- <100.00g 60.00g
rhel_dhcp54 1 2 0 wz--n- <31.00g 0
[root@node79 ~]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@node79 ~]# systemctl start docker
[root@node79 ~]# systemctl is-active docker
active
Step 2. Increase Cluster Size by Adding the Replacement Node.
- Create an inventory file with replacement host infos (add host):
[root@node88 ansible]# diff inventory-file-311pu-1xcns_1m2i1a4s_v6.txt inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
5a6
> new_nodes
115a117,120
>
> # New nodes to be added
> [new_nodes]
> node79.ocp.example.com openshift_node_group_name="node-config-compute"
[root@node88 ansible]# cat inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
[OSEv3:children]
masters
etcd
nodes
glusterfs
new_nodes
. . .
# New nodes to be added
[new_nodes]
node79.ocp.example.com openshift_node_group_name="node-config-compute"
Check connectivity
[root@node88 ansible]# ansible -i ./inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt new_nodes -b -m ping
node79.ocp.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
- Scale-up the OCP cluster:
[root@node88 ansible]# ansible-playbook -i /etc/ansible/inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt /usr/share/ansible/openshift-ansible/playbooks/openshift-node/scaleup.yml
. . .
INSTALLER STATUS ****************************************************************************************************************************************************************************************************
Initialization : Complete (0:03:50)
Node Bootstrap Preparation : Complete (0:22:18)
Node Join : Complete (0:01:21)
[root@node80 ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
node79.ocp.example.com Ready compute 21m v1.11.0+d4cacc0
node80.ocp.example.com Ready master 69d v1.11.0+d4cacc0
node81.ocp.example.com Ready infra 69d v1.11.0+d4cacc0
node82.ocp.example.com Ready infra 69d v1.11.0+d4cacc0
node83.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node84.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node85.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node86.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node87.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
[root@node80 ~]# oc label node/node79.ocp.example.com logging-infra-fluentd=true
node/node79.ocp.example.com labeled
- Update the inventory file to reflect the new OCP cluster:
[root@node88 ansible]# diff inventory-file-311pu-1xcns_1m2i1a4s_v7.txt inventory-file-311pu-1xcns_1m2i1a4s_v6_tmp.txt
110d109
< node79.ocp.example.com openshift_node_group_name="node-config-compute"
118d116
< node79.ocp.example.com glusterfs_devices='[ "/dev/sdc", "/dev/sdd", "/dev/sde" ]'
121a120
> node79.ocp.example.com openshift_node_group_name="node-config-compute"
[root@node88 ansible]# cat inventory-file-311pu-1xcns_1m2i1a4s_v7.txt
[OSEv3:children]
masters
etcd
nodes
glusterfs
new_nodes
. . .
[nodes]
node80.ocp.example.com openshift_node_group_name="node-config-master"
node81.ocp.example.com openshift_node_group_name="node-config-infra"
node82.ocp.example.com openshift_node_group_name="node-config-infra"
node83.ocp.example.com openshift_node_group_name="node-config-compute"
node84.ocp.example.com openshift_node_group_name="node-config-compute"
node85.ocp.example.com openshift_node_group_name="node-config-compute"
node86.ocp.example.com openshift_node_group_name="node-config-compute"
node87.ocp.example.com openshift_node_group_name="node-config-compute"
node79.ocp.example.com openshift_node_group_name="node-config-compute"
# OpenShift hosts with gluster pods
[glusterfs]
node84.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node85.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node86.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node87.ocp.example.com glusterfs_devices='[ "/dev/sdc" ]'
node79.ocp.example.com glusterfs_devices='[ "/dev/sdc", "/dev/sdd", "/dev/sde" ]'
# New nodes to be added
[new_nodes]
- Open firewall ports for OCS:
[root@node79 sysconfig]# diff /etc/sysconfig/iptables /etc/sysconfig/iptables.bak
41,47d40
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24010 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 3260 -j ACCEPT
< -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
[root@node79 sysconfig]# systemctl restart iptables
- Label the replacement node:
[root@node80 ~]# oc project app-storage
Now using project "app-storage" on server "https://node80.ocp.example.com:443".
[root@node80 ~]# oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
glusterfs-storage 4 4 4 4 4 glusterfs=storage-host 69d
[root@node80 ~]# oc label node node79.ocp.example.com glusterfs=storage-host
node/node79.ocp.example.com labeled
[root@node80 ~]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn 1/1 Running 0 63d 10.129.2.2 node84.ocp.example.com <none>
glusterfs-storage-6s5wg 0/1 ContainerCreating 0 6s 1.1.1.79 node79.ocp.example.com <none>
glusterfs-storage-fz77p 1/1 Running 4 69d 1.1.1.87 node87.ocp.example.com <none>
glusterfs-storage-mk7kf 1/1 Running 0 69d 1.1.1.86 node86.ocp.example.com <none>
glusterfs-storage-qvmkt 1/1 Running 4 69d 1.1.1.85 node85.ocp.example.com <none>
glusterfs-storage-whwm6 1/1 Running 1 69d 1.1.1.84 node84.ocp.example.com <none>
heketi-storage-1-pm548 1/1 Running 1 69d 10.129.0.5 node82.ocp.example.com <none>
- After some minutes, new glusterfs-storage pod should be healthy:
[root@node80 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
glusterblock-storage-provisioner-dc-1-cfcxn 1/1 Running 0 63d
glusterfs-storage-6s5wg 1/1 Running 0 1m
glusterfs-storage-fz77p 1/1 Running 4 69d
glusterfs-storage-mk7kf 1/1 Running 0 69d
glusterfs-storage-qvmkt 1/1 Running 4 69d
glusterfs-storage-whwm6 1/1 Running 1 69d
heketi-storage-1-pm548 1/1 Running 1 69d
- Add the Replacement Node to OCS/Heketi:
NOTE:Info about old node: Get heketi node id e.g. from output of # heketi-cli topology info.
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18 Name:/dev/sdc State:online Size (GiB):99 Used (GiB):56 Free (GiB):43 Bricks:3
- Add replacement node to OCS cluster w/ heketi CLI:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node add --zone=1 --cluster=ffa6d8659829fa3492e5c2bb321f71b8 --management-host-name=node79.ocp.example.com --storage-host-name=1.1.1.79
Node information:
Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname node79.ocp.example.com
Storage Hostname 1.1.1.79
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node list
Id:483ef5223e013b52b78a161e340183df Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:68d2fc78df9cbad7f0c44080924802ec Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:971bee9f2d3c7e4e875350d9996ff69b Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:a32f3112cc659819433895d0d1a40e67 Cluster:ffa6d8659829fa3492e5c2bb321f71b8
Id:f2f51fdaf3440c0f7e1aa5165deef042 Cluster:ffa6d8659829fa3492e5c2bb321f71b8
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
- Add disk devices on replacement node to heketi/OCS:
[root@node80 ~]# for i in /dev/sdc /dev/sdd /dev/sde; do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device add --name=${i} --node=68d2fc78df9cbad7f0c44080924802ec; done
/dev/sdc
Device added successfully
/dev/sdd
Device added successfully
/dev/sde
Device added successfully
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
Id:5215babf19e78cb2f30bd7f649b426fa Name:/dev/sdc State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:a0ee466d2b83a82f0ba662ee836f49bb Name:/dev/sdd State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:adb582446ebbfa28fe97045f80e7d80f Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
- Device info on old node:
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device info $i; done
23d930554b78c9d25dbafb838c236f68
Device Id: 23d930554b78c9d25dbafb838c236f68
Name: /dev/sdd
State: online
Size (GiB): 99
Used (GiB): 0
Free (GiB): 99
Bricks:
2a36fcad4ab642c5c534916609faff0f
Device Id: 2a36fcad4ab642c5c534916609faff0f
Name: /dev/sde
State: online
Size (GiB): 99
Used (GiB): 0
Free (GiB): 99
Bricks:
eb21e3dcfabb76d2513b5d4843b6bb18
Device Id: eb21e3dcfabb76d2513b5d4843b6bb18
Name: /dev/sdc
State: online
Size (GiB): 99
Used (GiB): 56
Free (GiB): 43
Bricks:
Id:7fa1d85e6e5685e6d39b3cfe2e6ee6c9 Size (GiB):4 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_7fa1d85e6e5685e6d39b3cfe2e6ee6c9/brick
Id:a718ec9afe0696591c93056a209de4ee Size (GiB):2 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_a718ec9afe0696591c93056a209de4ee/brick
Id:bdb4367d41003d4e350bc678175dbad9 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_eb21e3dcfabb76d2513b5d4843b6bb18/brick_bdb4367d41003d4e350bc678175dbad9/brick
- Remove disk devices from the old node in heketi:
NOTE: This step can take a long time as bricks will be evacuated to other nodes
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device disable $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 is now offline
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f is now offline
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 is now offline
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device remove $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 is now removed
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f is now removed
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 is now removed
- Verify that bricks have been evacuated from old node:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
Id:23d930554b78c9d25dbafb838c236f68 Name:/dev/sdd State:failed Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:2a36fcad4ab642c5c534916609faff0f Name:/dev/sde State:failed Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:eb21e3dcfabb76d2513b5d4843b6bb18 Name:/dev/sdc State:failed Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
- Delete disk devices from the old node in heketi:
[root@node80 ~]# for i in $(oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042 | awk -F: '/^Id/ {print $2}' | awk '{print $1}'); do echo $i; oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin device delete $i; done
23d930554b78c9d25dbafb838c236f68
Device 23d930554b78c9d25dbafb838c236f68 deleted
2a36fcad4ab642c5c534916609faff0f
Device 2a36fcad4ab642c5c534916609faff0f deleted
eb21e3dcfabb76d2513b5d4843b6bb18
Device eb21e3dcfabb76d2513b5d4843b6bb18 deleted
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info f2f51fdaf3440c0f7e1aa5165deef042
Node Id: f2f51fdaf3440c0f7e1aa5165deef042
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node86.ocp.example.com
Storage Hostname: 1.1.1.86
Devices:
- Check that replacement node has bricks now:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node info 68d2fc78df9cbad7f0c44080924802ec
Node Id: 68d2fc78df9cbad7f0c44080924802ec
State: online
Cluster Id: ffa6d8659829fa3492e5c2bb321f71b8
Zone: 1
Management Hostname: node79.ocp.example.com
Storage Hostname: 1.1.1.79
Devices:
Id:5215babf19e78cb2f30bd7f649b426fa Name:/dev/sdc State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
Id:a0ee466d2b83a82f0ba662ee836f49bb Name:/dev/sdd State:online Size (GiB):99 Used (GiB):54 Free (GiB):45 Bricks:2
Id:adb582446ebbfa28fe97045f80e7d80f Name:/dev/sde State:online Size (GiB):99 Used (GiB):0 Free (GiB):99 Bricks:0
- Delete the old node from heketi:
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node disable f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 is now offline
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node remove f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 is now removed
[root@node80 ~]# oc rsh heketi-storage-1-pm548 heketi-cli --secret $HEKETI_ADMIN_KEY --user admin node delete f2f51fdaf3440c0f7e1aa5165deef042
Node f2f51fdaf3440c0f7e1aa5165deef042 deleted
Step 3. Delete and Uninstall the Old Node from the OCP Cluster
- Remove the storage label from the old node in OCP:
[root@node80 ~]# oc label node node86.ocp.example.com glusterfs-
node/node86.ocp.example.com labeled
[root@node80 ~]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn 1/1 Running 0 63d 10.129.2.2 node84.ocp.example.com <none>
glusterfs-storage-4vfr4 1/1 Running 0 2m 1.1.1.79 node79.ocp.example.com <none>
glusterfs-storage-fz77p 1/1 Running 4 69d 1.1.1.87 node87.ocp.example.com <none>
glusterfs-storage-mk7kf 1/1 Terminating 0 69d 1.1.1.86 node86.ocp.example.com <none>
glusterfs-storage-qvmkt 1/1 Running 4 69d 1.1.1.85 node85.ocp.example.com <none>
glusterfs-storage-whwm6 1/1 Running 1 69d 1.1.1.84 node84.ocp.example.com <none>
heketi-storage-1-pm548 1/1 Running 1 69d 10.129.0.5 node82.ocp.example.com <none>
- After some minutes, no more glusterfs pods on old node:
[root@node80 ~]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
glusterblock-storage-provisioner-dc-1-cfcxn 1/1 Running 0 64d 10.129.2.2 node84.ocp.example.com <none>
glusterfs-storage-4vfr4 1/1 Running 0 1h 1.1.1.79 node79.ocp.example.com <none>
glusterfs-storage-fz77p 1/1 Running 4 69d 1.1.1.87 node87.ocp.example.com <none>
glusterfs-storage-qvmkt 1/1 Running 4 69d 1.1.1.85 node85.ocp.example.com <none>
glusterfs-storage-whwm6 1/1 Running 1 69d 1.1.1.84 node84.ocp.example.com <none>
heketi-storage-1-pm548 1/1 Running 1 69d 10.129.0.5 node82.ocp.example.com <none>
- Delete old node’s object from the OCP cluster:
[root@node80 ~]# oc get pods -o wide --all-namespaces | grep node86
openshift-logging logging-fluentd-ltlsz 1/1 Running 0 14d 10.131.2.2 node86.ocp.example.com <none>
openshift-monitoring node-exporter-t6vrf 2/2 Running 0 14d 1.1.1.86 node86.ocp.example.com <none>
openshift-node sync-bmktg 1/1 Running 1 69d 1.1.1.86 node86.ocp.example.com <none>
openshift-sdn ovs-9z8c6 1/1 Running 1 69d 1.1.1.86 node86.ocp.example.com <none>
openshift-sdn sdn-qbggs 1/1 Running 1 69d 1.1.1.86 node86.ocp.example.com <none>
[root@node80 ~]# oc adm drain node86.ocp.example.com
node/node86.ocp.example.com cordoned
error: unable to drain node "node86.ocp.example.com", aborting command...
There are pending nodes to be drained:
node86.ocp.example.com
error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): logging-fluentd-ltlsz, node-exporter-t6vrf, sync-bmktg, ovs-9z8c6, sdn-qbggs
[root@node80 ~]# oc delete node node86.ocp.example.com
node "node86.ocp.example.com" deleted
[root@node80 ~]# oc get nodes
NAME STATUS ROLES AGE VERSION
node79.ocp.example.com Ready compute 3h v1.11.0+d4cacc0
node80.ocp.example.com Ready master 69d v1.11.0+d4cacc0
node81.ocp.example.com Ready infra 69d v1.11.0+d4cacc0
node82.ocp.example.com Ready infra 69d v1.11.0+d4cacc0
node83.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node84.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node85.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
node87.ocp.example.com Ready compute 69d v1.11.0+d4cacc0
Step 4. Uninstall old node from OCP cluster:
- Create a dedicated inventory file for node deletion:
[root@node88 ~]# cat /etc/ansible/inventory-file-del_node.txt
[OSEv3:children]
nodes
[OSEv3:vars]
debug_level=2
ansible_user=root
[nodes]
node86.ocp.example.com openshift_node_group_name="node-config-compute"
- Run the uninstall playbook:
[root@node88 ~]# ansible-playbook -i /etc/ansible/inventory-file-del_node.txt /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml
. . .
PLAY RECAP **********************************************************************************************************************************************************************************************************
node86.ocp.example.com : ok=37 changed=11 unreachable=0 failed=0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.