How to configure `fence_kubevirt` in a Red Hat High Availability cluster with pacemaker?
Environment
- Red Hat Enterprise Linux 8.4 or later and RHEL 9.0 or later
- Pacemaker Cluster with VMs on OpenShift Virtualization
Issue
- How do I configure
fence_kubevirtin a Red Hat High Availability cluster with pacemaker?
Resolution
Assume the following about the cluster architecture
- Pacemaker node names are
node1andnode2. - VM names of the cluster nodes are
node1-vmandnode2-vm. - Service Account User used for
fence_kubevirtagent:fence-sa - Namespace under which the VMs are running:
cluster-development - Firewall port for connection between VMs (cluster nodes) with Kube API server is open (Default port is 6443)
Steps to configure fence_kubevirt:
-
Create a service account in the OpenShift Platform which will be used by the fence resource. This will allow the
fence_kubevirtto directly access the API. The service account's API token and registry credentials do not expire:-
For OCP version 4.15: Refer to document for creating the service account.
-
For OCP version 4.16 & later: With the change introduced in OCP 4.16 for service account, the token secrets are no longer generated. Refer section: Legacy service account API token secrets are no longer generated for each service account. So there is a need to manually create a legacy service account.
For OCP version 4.15 or earlier
1.1. To create the Service account from GUI:- Login to OpenShift GUI console and under
User ManagementselectServiceAccounts. - Select
Create ServiceAccounton top right. - In the YAML file, update the
nameandnamespace& clickCreate. For the current example: set the value fornameasfence-saandnamespaceascluster-development
1.2. Assign the
adminroles to the new Service Account:- Under
User ManagementselectRoleBindingsand then selectCreate binding. - Select
Namespace role bindingand populate the necessary details. As per the example setup taken:Namespacevalue iscluster-development;Role nameisadmin;SubjectisServiceAccount.
For OCP version 4.16 or later
1.1. To create the Service account:- Login to OpenShift GUI console and under
User ManagementselectServiceAccounts. - Select
Create ServiceAccounton top right. - In the YAML file, update the
nameandnamespace& clickCreate. For the current example: set the value fornameasfence-saandnamespaceascluster-development
1.2. To create the Secret:
- Follow the steps detailed in Creating a legacy service account token secret
1.3. Assign the
adminroles to the new Service Account:- Under
User ManagementselectRoleBindingsand then selectCreate binding. - Select
Namespace role bindingand populate the necessary details. As per the example setup taken:Namespacevalue iscluster-development;Role nameisadmin;SubjectisServiceAccount;Subject namespaceiscluster-development;Subject nameisfence-sa
-
-
On all the cluster nodes create a file
/root/.kube/configwith the following contents:
# vi /root/.kube/config
apiVersion: v1
clusters:
- cluster:
# insecure-skip-tls-verify: true
certificate-authority: /root/<certificate>.pem
server: https://<shift-server>:<port>
name: <shift-server>:<port>
contexts:
- context:
cluster: <shift-server>:<port>
user: <user@example.com>/<shift-server>:<port>
name: <user-namespace>/<shift-server>:<port>/<user@example.com>
current-context: <user-namespace>/<shift-server>:<port>/<user@example.com>
kind: Config
preferences: {}
users:
- name: <user@example.com>/<shift-server>:<port>
user:
token: <XXXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXX>
-
In order to populate the values for each of the parameter, refer to below details:
3.1. For
certificate-authority:- Login to OpenShift GUI console and browse to
Workloads->Secrets-> Select the namespace (examplecluster-development):
**1.1. For OCP v4.15 or earlier:** Select `<service-account>-token-*` (example `fence-sa-token-*`) -> Under `Data`, copy `ca.cert` **1.2. For OCP v4.16 or later:** Select the newly created secret (example `fence-secret`) -> Under `Data`, copy `ca.cert`- Paste the copied data of
ca.certon both the cluster nodes under/root/rootCA.pem.
3.2. The login to the GUI console would be with
kubeadminaccount, on top right Click on the userkubeadminand in the drop down option selectCopy Login Command. It will open a new window and selectDisplay Token.3.3. For
server:- From the details listed after selecting
Display Token, copy the value from--serverfield.
3.4. For value of
nameunderclustersand value ofclusterundercontext:- From the details listed after selecting
Display Token, copy the value from--serverfield excepthttps://.
3.5. For value of
userundercontext:- This field will contain the service account user detail configured for fence i.e.
fence-sa
3.6. For value of
token:- Login to OpenShift GUI console and browse to
Workloads->Secrets-> Select the namespace (examplecluster-development):
**1.1. For OCP v4.15 or earlier:** Select `<service-account>-token-*` (example `fence-sa-token-*`) -> Under `Data`, copy value for `token`. **1.2. For OCP v4.16 or later:** Select the newly created secret (example `fence-secret`) -> Under `Data`, copy value for `token`. - Login to OpenShift GUI console and browse to
-
A dummy file entries:
# cat /root/.kube/config
apiVersion: v1
clusters:
- cluster:
# insecure-skip-tls-verify: true
certificate-authority: /root/rootCA.pem
server: https://api.development.example.com:6443
name: api.development.example.com:6443
contexts:
- context:
cluster: api.development.example.com:6443
user: fence-sa/api.development.example.com:6443
name: cluster-development/api.development.example.com:6443/fence-sa
current-context: cluster-development/api.development.example.com:6443/fence-sa
kind: Config
preferences: {}
users:
- name: fence-sa/api.development.example.com:6443
user:
token: Txkf5-JhtbpRtH8ER15gbh_UjRgfeVP
- Manually verify that the fence agent is able to communicate with the fence device:
# fence_kubevirt --namespace <user-namespace> -o list
# fence_kubevirt --namespace <user-namespace> -o status -n <vm-name>
Example:
# fence_kubevirt --namespace cluster-development -o list
node1-vm
node2-vm
# fence_kubevirt --namespace cluster-development -o status -n node1-vm
Status: ON
- Create the fence resource:
# pcs stonith create <fence-resource-name> fence_kubevirt kubeconfig=/path/to/kube-config-file namespace=<namespace> pcmk_host_map="<pacemaker_nodename1>:<node1-vm>;<pacemaker_nodename2>:<node2-vm>"
Example:
# pcs stonith create cluster_fence fence_kubevirt kubeconfig=/root/.kube/config namespace=cluster-development pcmk_host_map="node1:node1-vm;node2:node2-vm"
- Verify the fence resource status and fence configuration:
# pcs stonith status
# pcs stonith config
-
Test the fence action to ensure that the VMs are getting rebooted:
Additional notes and recommendations
- If you have a two-node cluster, consider setting a 'pcmk_delay_max' parameter to prevent fence race scenarios.
- If the status check takes longer than the default value, then increasing the monitor timeout 'pcmk_monitor_timeout' will help to allow the cluster with buffer time to complete its operation.
- For more information on the correct format for
pcmk_host_map, refer to the following solution: What format should I use to specify node mappings to stonith devices in pcmk_host_list and pcmk_host_map in a RHEL High Availability pacemaker cluster? - During the fence test if it is observed that the fence operation is marked as
failedwith the node going inPowered OFFstate, then try increasing thepcmk_reboot_timeoutandretry_onparameters:
# pcs stonith update <stonith-ID> pcmk_reboot_timeout=240 retry_on=4
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.