Limited Live Migration from OpenShift SDN to OVN-Kubernetes

Solution Verified - Updated

Environment

  • OpenShift Container Platform
    • 4.16
    • 4.15.31+

Issue

  • Red Hat OpenShift Container Platform 4.16 will be the final version that supports OpenShiftSDN (reference). Red Hat OpenShift Container Platform requires new installations, as of the 4.15 release, to use OVN-Kubernetes (reference). Existing clusters using OpenShiftSDN can continue to be upgraded until 4.16, but must migrate before upgrading to Red Hat OpenShift Container Platform 4.17. The current documentation allows for the OpenShiftSDN to OVN-Kubernetes migration to take place, but requires a cluster-wide outage.

Resolution

For Red Hat OpenShift Service on AWS classic architecture (ROSA Classic) or OpenShift Dedicated (OSD) cluster. The cluster version must be 4.16.43 and above and need to follow the official document: • ROSA Classic: Migrating from OpenShift SDN network plugin to OVN-Kubernetes network plugin • OSD: Migrating from OpenShift SDN network plugin to OVN-Kubernetes network plugin

---

Diagnostic Steps

Impact on features during a limited live migration

EgressIP

  • EgressIP functionality is disabled during a limited live migration
  • EgressIP supports the automatic migration of objects which will become functional again after the migration is completed
  • Look for usage of EgressIP in your cluster by checking for egressIPs or egressCIDRs on your NetNamespace and HostSubnet objects:
    Automatic EgressIP Range Assignment:
oc patch netnamespace project1 --type=merge -p '{"egressIPs": ["192.168.1.100"]}'
oc patch hostsubnet node1 --type=merge -p '{"egressCIDRs": ["192.168.1.0/24"]}'
  • or Manual EgressIP Assignment
oc patch netnamespace project1 --type=merge -p '{"egressIPs": ["192.168.1.100","192.168.1.101"]}'
oc patch hostsubnet node1 --type=merge -p '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'

You can check for EgressIP assignments using the following command:

oc get netnamespace -A | awk '$3 != ""'

EgressNetworkPolicy (In OVN, referred to as EgressFirewall )

  • EgressNetworkPolicy will remain functional during the limited live migration
  • EgressNetworkPolicy supports the automatic migration of objects and will remain functional once the migration is completed

You can look for EgressNetworkPolicy(In OVN, referred to as EgressFirewall) usage in your cluster by running the following command:

oc get egressnetworkpolicies.network.openshift.io -A

Egress Router

  • EgressRouter is supported by both OpenShift-SDN and OVN-Kubernetes, but the configuration and API are different
  • OVN-Kubernetes only supports redirect mode where as OpenShift-SDN additionally supports HTTP and DNS proxy modes
  • The EgressRouter functionality will not work during the limited live migration, is not automatically migrated, and is required to be reconfigured once the migration is complete
  • Look for usage of the EgressRouter pod in your cluster:
apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" 
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router

You can check all pods across the cluster for egress router usage using the following command:

oc get pods --all-namespaces -o json | jq '.items[] | select(.metadata.annotations."pod.network.openshift.io/assign-macvlan" == "true") | {name: .metadata.name, namespace: .metadata.namespace}'
  • Or you can look for the EgressRouter destination ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
  name: egress-routes

Multicast

  • Multicast functionality is disabled during a limited live migration
  • Multicast supports the automatic migration of objects which will become functional again after the migration is completed

You can look for usage of Multicast in your cluster by checking for the multicast annotation using the following command:

$ oc get netnamespace -o json | jq -r '.items[] | select(.metadata.annotations."netnamespace.network.openshift.io/multicast-enabled" == "true") | .metadata.name'

Multitenant Isolation Mode

  • This feature is only supported by OpenShift-SDN and will stop working during and after the migration is completed
  • Look for Multitenant Isolation Mode by checking the Network custom resource for the mode: Multitenant configuration:
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    type: OpenShiftSDN
    openshiftSDNConfig:
      mode: Multitenant

100.64.0.0/16 and 100.88.0.0/16 Address Range Usage

  • If your cluster or surrounding network uses any part of the 100.64.0.0/16 or 100.88.0.0/16 address range, you must choose another unused IP range by specifying the v4InternalSubnet and or internalTransSwitchSubnet spec under the spec.defaultNetwork.ovnKubernetesConfig object definition. OVN-Kubernetes uses the 100.64.0.0/16 and 100.88.0.0/16 IP range internally by default and this can not overlap. You can reference the steps to complete these changes in the Patching OVN-Kubernetes address ranges section of our documentation.

  • You can check your cluster configuration for this network overlap by looking at the clusterNetwork and the serviceNetwork by executing the following command:

oc get network/cluster -o json | jq '.spec | .clusterNetwork[].cidr,.serviceNetwork'
  • It is also recommended to check with your datacenter networking team to verify these range are not used anywhere else on external networks within your OpenShift infrastructure.

Maximum Transmission Unit - MTU

  • The cluster MTU is the MTU value for pod interfaces. It is always less than your hardware MTU to account for the cluster network overlay overhead. The overhead is 100 bytes for OVN-Kubernetes and 50 bytes for OpenShift SDN.

  • During the limited live migration, both OVN-Kubernetes and OpenShift SDN run in parallel. OVN-Kubernetes manages the cluster network of some nodes, while OpenShift SDN manages the cluster network of others. To ensure that cross-CNI traffic remains functional, the Cluster Network Operator updates the routable MTU to ensure that both CNIs share the same overlay MTU. As a result, after the migration has completed, the cluster MTU is 50 bytes less.

  • As a result during the limited live migration it will automatically calculate what values the MTU needs to be set at whether you are using the default install MTU value or a custom MTU value.

  • If you use the default MTU, and you want to change to a custom MTU, you must perform the MTU change before the SDN migration, following our documented procedure Changing the MTU for the cluster network.

NodeNetworkConfigurationPolicy

The OpenShiftSDN CNI plugin supports a NodeNetworkConfigurationPolicy (NNCP) custom resource (CR) to configure the primary interface on a node. The OVN-Kubernetes network plugin does not have this capability.

To check if you are using a NNCP, run the following command:

oc get nncp

Secondary pod Interfaces Created through the Multus CNI Plugin
In most cases, the Limited Live Migration is independent of the secondary interfaces of pods created by the Multus CNI plugin. However, if these secondary interfaces were set up on the default network interface card (NIC) of the host, using the network types like MACVLAN, IPVLAN, SR-IOV, or bridge interfaces with the default NIC as the control node, OVN-Kubernetes might encounter issues. Users should remove such configurations before proceeding with the Limited Live Migration.

To check if you are using any NetworkAttachmentDefinitions, run the following command:

oc get network-attachment-definitions -A

Multiple Network Interface Cards on a Node

When there are multiple NICs inside the host, and the default route is not on the interface that has the Kubernetes NodeIP, you must use the offline migration.

Static Routes and Routing Policies
If the cluster depends on static routes or routing policies via the host network so that pods can reach specific external destinations, you must set routingViaHost to true and ipForwarding to Global on the gatewayConfig key in the network.operator.openshift.io CR before you start the Limited Live Migration. Additional details on patching the network.operator.openshift.io CR can be found in Step 5 of the Checking cluster resources before initiating the limited live migration section of the documentation.

Unmanaged DaemonSets

All DaemonSet objects in the openshift-sdn namespace, which are not managed by the Cluster Network Operator (CNO), must be removed before initiating the Limited Live Ligration. These unmanaged daemon sets can cause the migration status to remain incomplete.

To check if you are running any additional DaemonSets in the openshift-sdn namespace beyond sdn and sdn-controller, run the following command:

oc get daemonset -n openshift-sdn
NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
sdn              6         6         6       6            6           kubernetes.io/os=linux            425d
sdn-controller   3         3         3       3            3           node-role.kubernetes.io/master=   425d

GitOps Configuration Management
If you are using a GitOps methodology to manage cluster configurations, specifically cluster CRs like network.operator.openshift.io, you need to disable the synchronization and management of these resources so they can be modified and managed by the Limited Live Migration and ensure the changes are not reverted.

kube-proxy customization
The OVNKubernetes CNI does not support custom kube-proxy configurations. If you are customizing kube-proxy through the kubeProxyConfig key in the network.operator.openshift.io CR, you need to edit it and remove the options, you can perform this changes even after migration to OVN is completed.

oc get networks.operator.openshift.io cluster -o yaml
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  annotations:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 100.124.0.0/18
    hostPrefix: 24
  defaultNetwork:
    openshiftSDNConfig:
      enableUnidling: true
      mode: NetworkPolicy
      mtu: 8946
      vxlanPort: 4789
    ovnKubernetesConfig:
      gatewayConfig:
        routingViaHost: false
      genevePort: 6081
      mtu: 8946
    type: OpenShiftSDN
  kubeProxyConfig:              <---------- remove
    bindAddress: 0.0.0.0      <---------- remove
  logLevel: Normal
  managementState: Managed

Monitoring the stages of the migration

You can follow the progress of the migration using the following statuses set on the network.config CustomResource (CR).

oc get network.config.openshift.io cluster -o yaml
  • Deploy the target Container Network Interface (CNI) plugin

    • The Cluster Network Operator (CNO) will deploy both CNI plugins in migration mode. The original CNI is still used and the target CNI will do nothing
    • The network.config status .status.conditions NetworkTypeMigrationTargetCNIAvailable becomes TRUE when this completes
  • Apply a routable Maximum Transmission Unit (MTU)

    • To migrate to OVN-Kubernetes, we need to apply a routable MTU to the cluster that is 50 bytes less than the original cluster MTU
    • The Machine Config Operator (MCO) will trigger a node reboot to apply this change into each node
    • The network.config status .status.conditions NetworkTypeMigrationMTUReady becomes TRUE when this completes
  • Transition the active CNI plugin from OpenShiftSDN to OVN-Kubernetes

    • The CNO will update the networkType field of the network.operator CustomResource (CR) to OVNKubernetes
    • The MCO will apply a new MachineConfig to each node to enable the ovs-configuration.service
    • After rebooting, the bridge br-ex device is created and OVN-Kubernetes will be the active CNI on the node
    • The network.config status .status.conditions NetworkTypeMigrationTargetCNIInUse becomes TRUE when this completes for all of the nodes
  • Remove the original OpenShiftSDN CNI objects

    • The OpenShiftSDN is not in use any more and the CNO will remove all of the OpenShift SDN objects from the cluster
    • The network.config status .status.conditions NetworkTypeMigrationOriginalCNIPurged becomes TRUE when this completes
  • The migration is complete

    • The network.config status .status.conditions NetworkTypeMigrationInProgress transitions to FALSE with the reason NetworkTypeMigrationCompleted

Potential bugs to be aware of

ResourceDescription
Built-in join subnet "100.64.0.0/16" overlaps cluster subnet "100.64.0.0/15" even though internalJoinSubnet is configuredThis issue was resolved in Red Hat OpenShift Container Platform 4.16.15 via RHSA-2024:7174. Modifications the the join subnet are not considered when the CNO calculates overlap conflicts.
This content is not included.Network policy does not working properly during SDN live migrationThis issue was resolved in Red Hat OpenShift Container Platform 4.16.24 via RHSA-2024:10147. It's recommend to update to Red Hat OpenShift Container Platform 4.16.24 or later to prevent issues from happening during the Limited Live Migration.
This content is not included.Allow from host network networkpolicies do not work during live migrationThis issue was resolved in Red Hat OpenShift Container Platform 4.16.24 via RHSA-2024:10147. It's recommend to update to Red Hat OpenShift Container Platform 4.16.24 or later to prevent issues from happening during the Limited Live Migration.
This content is not included.SDN to OVN-K live migration uns MTU migration phase more than once and failsUnder certain circumstances when MachineConfigPools are paused, the limited live migration runs the MTU migration phase correctly and then, while running the second MCO rollout to enable the target CNI, it tries to run the MTU migration phase again. This issue was resolved in Red Hat OpenShift Container Platform 4.16.28 via RHBA-2024:11502. It's recommend to update to Red Hat OpenShift Container Platform 4.16.28 or later to prevent issues from happening during the Limited Live Migration.
This content is not included.SDN pods consume too much RAM causing the OVN Limited Live migration to fail when testing with 500 worker nodesClusters of 250 nodes and below were tested and migrated successfully. In clusters larger than 250 nodes, the SDN pods have the potential to consume too much memory leading to instability causing the Limited Live Migration to fail. This issue was resolved in Red Hat OpenShift Container Platform 4.16.36 via RHSA-2025:1707. If you are migrating clusters with 250+ nodes, you need to update to Red Hat OpenShift Container Platform 4.16.36 or later to prevent this issue during the Limited Live Migration.
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.