Use OpenShift Data Foundation Disaster Recovery to Protect Virtual Machines

Updated

Overview

The choice of whether to use OpenShift Data Foundation Metro-DR or OpenShift Data Foundation Regional-DR depends upon several factors including: the network latency between data centers and your Recovery Point Objective (RPO). Protected applications may be defined using GitOps methods or by using labels on the included Kubernetes resources. Regardless of these choices, there are some best practices to follow. Additionally, there are some recommendations that apply specifically to GitOps applications and some current limitations that apply to Regional-DR.

About ACM Applications

Red Hat Advanced Cluster Management enables management of multiple OpenShift clusters including deployed applications and is required for both OpenShift Data Foundation Disaster Recovery solutions. There are two main categories of ACM applications: managed applications and discovered applications.

ACM Managed Applications

Managed Applications are defined using GitOps techniques. The ACM documentation describes two types of managed applications: Subscription and ApplicationSet. Both of these work by using an external Git repository containing YAML files which fully define the application.

When including a Virtual Machine in an ACM managed application, use a PersistentVolumeClaim paired with a VolumeImportSource to prepare a virtual machine disk. Volume Populators were recently introduced in OpenShift Virtualization and more information is available in the Containerized Data Importer (CDI) Content from github.com is not included.documentation.

Red Hat provides RHEL containerdisks that can be imported directly into OpenShift Virtualization. Search the software catalog to see which versions are available. It is recommended to use a specific version of the image rather than the latest tag in order to have consistent results. The KubeVirt community maintains containerdisks for other operating systems in a Content from quay.io is not included.Quay repository.

Note that pullMethod: node is specified in the VolumeImportSource CR in order to take advantage of the OpenShift pull secret which is required to pull container images from the Red Hat registry.

In order to apply a DR policy to an application, its resources should share a common label. The VirtualMachine, PersistentVolumeClaim, and VolumeImportSource have the label drapp: dr-vm. If the Virtual Machine depends on other resources (such as DataVolumes, Service, Routes, ConfigMaps, or Secrets) make sure that those resources are also labeled.

The following example shows how to define a virtual machine that follows these recommendations:

vm.yaml

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: dr-vm
  namespace: default
  labels:
    drapp: dr-vm
spec:
  instancetype:
    name: u1.medium
  preference:
    name: rhel.9
  running: true
  template:
    spec:
      domain:
        devices:
          disks:
            - disk:
                bus: virtio
              name: dr-vm-disk
            - disk:
                bus: virtio
              name: cloudinitdisk
      volumes:
        - persistentVolumeClaim:
            claimName: dr-vm-disk
          name: dr-vm-disk
        - cloudInitConfigDrive:
            userData: |
              #cloud-config
              user: cloud-user
              password: 0qmg-iy6s-w1dw
              chpasswd:
                expire: false
          name: cloudinitdisk

pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dr-vm-disk
  namespace: default
  labels:
    drapp: dr-vm
spec:
  dataSourceRef:
    apiGroup: cdi.kubevirt.io
    kind: VolumeImportSource
    name: rhel9-source
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  storageClassName: ocs-external-storagecluster-ceph-rbd
  volumeMode: Block

source.yaml

apiVersion: cdi.kubevirt.io/v1beta1
kind: VolumeImportSource
metadata:
  name: rhel9-source
  labels:
    drapp: dr-vm
spec:
  source:
    registry:
      url: "docker://registry.redhat.io/rhel9/rhel-guest-image:9.4-1175"
      pullMethod: node

ACM Discovered Applications

You can also protect Virtual Machines that were not created using GitOps (eg. VMs imported using This content is not included.Red Hat Migration Toolkit for Virtualization or VMs created using the OpenShift Console). In this case labels are used to identify the resources belonging to the application. After the Virtual Machine has been created apply a common label to the following resources associated with the VM: VirtualMachine, DataVolume, PersistentVolumeClaim, Service, Route, Secret, ConfigMap. Do not label VirtualMachineInstances or Pods since these are created and managed automatically by OpenShift Virtualization.

The following example shows a Virtual Machine that is discoverable as an application using the label drapp: dr-vm

vm.yaml

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-rhel-datavolume-block
  namespace: acm-discovered-vm
  labels:
    kubevirt.io/vm: vm-rhel-datavolume
    drapp: dr-vm
spec:
  dataVolumeTemplates:
    - metadata:
        labels:
          drapp: dr-vm
        name: rhel-dv
      spec:
        source:
          registry:
            url: 'docker://registry.redhat.io/rhel9/rhel-guest-image:9.4-1175'
        storage:
          resources:
            requests:
              storage: 20Gi
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-rhel-datavolume
    spec:
      architecture: amd64
      domain:
        devices:
          disks:
            - disk:
                bus: virtio
              name: datavolumedisk1
        machine:
          type: pc-q35-rhel9.4.0
        resources:
          requests:
            memory: 4Gi
      terminationGracePeriodSeconds: 0
      volumes:
        - dataVolume:
            name: rhel-dv
          name: datavolumedisk1

Since ACM does not provision discovered applications an extra step is required to complete failover and relocate actions. As described in more detail in the OpenShift Data Foundation Documentation, the VirtualMachine and its related resources must be manually cleaned up from the evacuated cluster. The DRPlacementControl (DRPC) object will have a status of WaitOnUserToCleanUp when this step is required. Remove the VirtualMachine and all of its accompanying objects which carry the common label that was used to discover the application. This also includes PersistentVolumeClaims. When the DRPC is waiting for cleanup it is safe to delete these resources because the application has already been replicated to the target cluster.

Regional DR Limitations

Due to differences in how volume replication works, Regional DR currently has one important limitations. Volumes created from a CSI Snapshot or a CSI Clone can not be part of a protected application. These volumes are dependent on a parent volume which will not be replicated. To work around this limitation follow the above recommendation and create disks by importing data rather than cloning. This limitation will be removed in a future release.
If you want to clone from a PV, you can leverage a DRPolicy that flattens the PVs before replication as well.

Regional DR uses RBD snapshots to replicate data. VolumeGroupReplication for RBD-backed VMs has been implemented starting with ODF 4.19. With this functionality it is now possible to consistently DR-protect multi-volume VMs.
Read How for ODF 4.19

This limitation does not apply to Metro DR.

Conclusion

Enabling disaster recovery for your OpenShift application takes some advanced planning. With just a few additional considerations you can also protect virtual machines when using OpenShift Data Foundation Disaster Recovery.

* Multi-disk VMs are not supported
† Cloned disks are not supported

Category
Components
Article Type