How to configure `fence_kubevirt` in a Red Hat High Availability cluster with pacemaker?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 8.4 or later and RHEL 9.0 or later
  • Pacemaker Cluster with VMs on OpenShift Virtualization

Issue

  • How do I configure fence_kubevirt in a Red Hat High Availability cluster with pacemaker?

Resolution

Assume the following about the cluster architecture
  • Pacemaker node names are node1 and node2.
  • VM names of the cluster nodes are node1-vm and node2-vm.
  • Service Account User used for fence_kubevirt agent: fence-sa
  • Namespace under which the VMs are running: cluster-development
  • Firewall port for connection between VMs (cluster nodes) with Kube API server is open (Default port is 6443)
Steps to configure fence_kubevirt:
  1. Create a service account in the OpenShift Platform which will be used by the fence resource. This will allow the fence_kubevirt to directly access the API. The service account's API token and registry credentials do not expire:

    For OCP version 4.15 or earlier
    1.1. To create the Service account from GUI:

    1. Login to OpenShift GUI console and under User Management select ServiceAccounts.
    2. Select Create ServiceAccount on top right.
    3. In the YAML file, update the name and namespace & click Create. For the current example: set the value for name as fence-sa and namespace as cluster-development

    1.2. Assign the admin roles to the new Service Account:

    1. Under User Management select RoleBindings and then select Create binding.
    2. Select Namespace role binding and populate the necessary details. As per the example setup taken: Namespace value is cluster-development; Role name is admin; Subject is ServiceAccount.

    For OCP version 4.16 or later
    1.1. To create the Service account:

    1. Login to OpenShift GUI console and under User Management select ServiceAccounts.
    2. Select Create ServiceAccount on top right.
    3. In the YAML file, update the name and namespace & click Create. For the current example: set the value for name as fence-sa and namespace as cluster-development

    1.2. To create the Secret:

    1. Follow the steps detailed in Creating a legacy service account token secret

    1.3. Assign the admin roles to the new Service Account:

    1. Under User Management select RoleBindings and then select Create binding.
    2. Select Namespace role binding and populate the necessary details. As per the example setup taken: Namespace value is cluster-development; Role name is admin; Subject is ServiceAccount; Subject namespace is cluster-development; Subject name is fence-sa
  2. On all the cluster nodes create a file /root/.kube/config with the following contents:

# vi /root/.kube/config
apiVersion: v1
clusters:
- cluster:
    # insecure-skip-tls-verify: true
    certificate-authority: /root/<certificate>.pem
    server: https://<shift-server>:<port>
  name: <shift-server>:<port>
contexts:
- context:
    cluster: <shift-server>:<port>
    user: <user@example.com>/<shift-server>:<port>
  name: <user-namespace>/<shift-server>:<port>/<user@example.com>
current-context: <user-namespace>/<shift-server>:<port>/<user@example.com>
kind: Config
preferences: {}
users:
- name: <user@example.com>/<shift-server>:<port>
  user:
    token: <XXXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXX>
  1. In order to populate the values for each of the parameter, refer to below details:

    3.1. For certificate-authority:

    1. Login to OpenShift GUI console and browse to Workloads -> Secrets -> Select the namespace (example cluster-development):
     **1.1. For OCP v4.15 or earlier:** Select `<service-account>-token-*` (example `fence-sa-token-*`) -> Under `Data`, copy `ca.cert`
     **1.2. For OCP v4.16 or later:** Select the newly created secret (example `fence-secret`) -> Under `Data`, copy `ca.cert`
    
    1. Paste the copied data of ca.cert on both the cluster nodes under /root/rootCA.pem.

    3.2. The login to the GUI console would be with kubeadmin account, on top right Click on the user kubeadmin and in the drop down option select Copy Login Command. It will open a new window and select Display Token.

    3.3. For server:

    1. From the details listed after selecting Display Token, copy the value from --server field.

    3.4. For value of name under clusters and value of cluster under context:

    1. From the details listed after selecting Display Token, copy the value from --server field except https://.

    3.5. For value of user under context:

    1. This field will contain the service account user detail configured for fence i.e. fence-sa

    3.6. For value of token:

    1. Login to OpenShift GUI console and browse to Workloads -> Secrets -> Select the namespace (example cluster-development):
     **1.1. For OCP v4.15 or earlier:** Select `<service-account>-token-*` (example `fence-sa-token-*`) -> Under `Data`, copy value for `token`.
     **1.2. For OCP v4.16 or later:** Select the newly created secret (example `fence-secret`) -> Under `Data`, copy value for `token`.
    
  2. A dummy file entries:

# cat /root/.kube/config
apiVersion: v1
clusters:
- cluster:
    # insecure-skip-tls-verify: true
    certificate-authority: /root/rootCA.pem
    server: https://api.development.example.com:6443
  name: api.development.example.com:6443
contexts:
- context:
    cluster: api.development.example.com:6443
    user: fence-sa/api.development.example.com:6443
  name: cluster-development/api.development.example.com:6443/fence-sa
current-context: cluster-development/api.development.example.com:6443/fence-sa
kind: Config
preferences: {}
users:
- name: fence-sa/api.development.example.com:6443
  user:
    token: Txkf5-JhtbpRtH8ER15gbh_UjRgfeVP
  1. Manually verify that the fence agent is able to communicate with the fence device:
# fence_kubevirt --namespace <user-namespace> -o list
# fence_kubevirt --namespace <user-namespace> -o status -n <vm-name>

Example:
# fence_kubevirt --namespace cluster-development -o list
node1-vm
node2-vm

# fence_kubevirt --namespace cluster-development -o status -n node1-vm
Status: ON
  1. Create the fence resource:
# pcs stonith create <fence-resource-name> fence_kubevirt kubeconfig=/path/to/kube-config-file namespace=<namespace> pcmk_host_map="<pacemaker_nodename1>:<node1-vm>;<pacemaker_nodename2>:<node2-vm>"

Example:
# pcs stonith create cluster_fence fence_kubevirt kubeconfig=/root/.kube/config namespace=cluster-development pcmk_host_map="node1:node1-vm;node2:node2-vm"
  1. Verify the fence resource status and fence configuration:
# pcs stonith status
# pcs stonith config
  1. Test the fence action to ensure that the VMs are getting rebooted:

Additional notes and recommendations
# pcs stonith update <stonith-ID> pcmk_reboot_timeout=240 retry_on=4
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.