Rebuild all OpenShift-OVN-Kubernetes databases

Updated

Index

  1. Introduction
  2. Environment
  3. Rebuild OVN on OCP 4.6
  4. Rebuild OVN on OCP 4.7
  5. Rebuild OVN on OCP 4.8 - 4.13
  6. Rebuild OVN on OCP 4.14 and later

Introduction

In this article, we will go through the different procedures of doing a completely rebuild of OVN-Kubernetes in case issues like the following happen:

  • OVN Master Split-brains
  • Failure to spawn pods due to OVN issues
  • Myriad of OVN issues, such has complete inconsistencies on the NorthBound and SouthBound databases.

WARNING: This procedure will cause cluster-wide network interruption and has some risk. Please perform it only if instructed to do so or in case of OVN-Kubernetes issues.

NOTE: Methods are slightly different between RHOCP v4.6 and later releases. One of the main reasons is that OVS no longer lives in ovs-node pods in RHOCP v4.7 but it's a systemd process. Be sure to follow the right method and not omit any step.

RECOMMENDATION: It is recommended to do these procedures as a cluster-admin user authenticated via client certificates, like the system:admin (not to be confused with kubeadmin user, which authenticates via oauth). This is recommended because the procedure can cause OpenShift containers (in particular, pods in the openshift-apiserver, openshift-oauth-apiserver, openshift-ingress and openshift-authentication namespaces) to fail health checks because of the state of ovn-kubernetes during this process. Therefore, running the commands as a user that authenticates via oauth will have inconsistencies when connecting to the cluster to complete the procedure. Inversely, users authenticated via client certificates, like the system:admin user, do not go through oauth, and therefore it will avoid this issue. For more information on the system:admin user, please see About the OpenShift 4 kubeconfig file for system:admin.

NOTE: See also this related KCS regarding WebHooks preventing pod scheduling that frequently is observed during OVN database rebuilds that may become relevant if you encounter issues with pods failing to reschedule while working through the steps below.

Environment

  • Red Hat OpenShift Container Platform (RHOCP)

    • 4
  • OVN-Kubernetes

Rebuild OVN on RHOCP v4.6

  • Confirm that a new revision of the kube-apiserver is not in the process of being rolled out:

    $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
    
  • Remove the northbound and southbound databases and delete the ovnkube-master pods one-at-a-time:

    $ for OVNKUBEMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting databases for pod $OVNKUBEMASTER" ; \
      oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNKUBEMASTER rm -f /etc/openvswitch/ovnnb_db.db /etc/openvswitch/ovnsb_db.db; \
      done
    
    $ for OVNKUBEMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting pod $OVNKUBEMASTER" ; \
      oc -n openshift-ovn-kubernetes delete pod --wait=false $OVNKUBEMASTER ; sleep 6; \
      done
    
  • Validate the health by confirming there's a northbound and southbound Leader. This can take a few minutes to recover. If there's multiple leaders for either NB or SB, that may be a split-brain and the first step must be performed again:

    for OVNMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
         do echo "········································" ; \
         echo "· OVNKube Master: $OVNMASTER ·" ; \
         echo "········································" ; \
         echo 'North' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound | grep ^Role` ; \
         echo 'South' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound | grep ^Role`; \
         echo "····················"; \
         done
    
  • The output of the command above should look like this

    ········································
    · OVNKube Master: ovnkube-master-xxx
    ········································
    North Role: leader
    South Role: leader
    ····················
    ········································
    · OVNKube Master: ovnkube-master-yyy 
    ········································
    North Role: follower
    South Role: follower
    ····················
    ········································
    · OVNKube Master: ovnkube-master-zzz
    ········································
    North Role: follower
    South Role: follower
    ····················
    
  • Delete ovs-node and ovnkube-node pods on masters and validate health:

    $ for OVSMASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting pod on node $OVSMASTER" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pods -l app=ovs-node --field-selector spec.nodeName=$OVSMASTER -o name) ; sleep 3; \
      done
    
    $ for OVNNODEMASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting pod on node $OVNNODEMASTER" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node  --field-selector spec.nodeName=$OVNNODEMASTER -o name) ; sleep 3; \
      done
    
    $ oc -n openshift-ovn-kubernetes get pods -o wide
    
  • Delete ovs-node and ovnkube-node pods on non-master nodes and validate health:

    $ for OVSWORKER in $(oc get nodes -l '!node-role.kubernetes.io/master' -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting pod on node $OVSWORKER" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pods -l app=ovs-node --field-selector spec.nodeName=$OVSWORKER -o name) ; sleep 3; \
      done
    
    $ for OVNKUBENODE in $(oc get nodes -l '!node-role.kubernetes.io/master' -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting pod on node $OVNKUBENODE" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node  --field-selector spec.nodeName=$OVNKUBENODE -o name) ; sleep 3; \
      done
    
    $ oc -n openshift-ovn-kubernetes get pods -o wide 
    

Rebuild OVN on RHOCP v4.7

  • Confirm that a new revision of the kube-apiserver is not in the process of being rolled out:

    $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
    
  • (Optional) If using oc debug pre-pulling the debug image will make this whole process faster:

    $ for NODE in $(oc get nodes -o name --no-headers); \ do echo "Debug pod on node $NODE" ; \ oc debug $NODE -- chroot /host /bin/bash -c 'echo hello' ; sleep 2; \ done
    
  • Remove the northbound and southbound databases and delete the ovnkube-master pods one-at-a-time:

      $ for OVNKUBEMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
        do echo "Deleting databases for pod $OVNKUBEMASTER" ; \
        oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNKUBEMASTER rm -f /etc/openvswitch/ovnnb_db.db /etc/openvswitch/ovnsb_db.db; \
        done
    
      $ for OVNKUBEMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
        do echo "Deleting pod $OVNKUBEMASTER" ; \
        oc -n openshift-ovn-kubernetes delete pod --wait=false $OVNKUBEMASTER ; sleep 6; \
        done
    
  • Validate the health by confirming there's a Northbound and Southbound Leader. This can take a few minutes to recover. If there's multiple leaders for either NB or SB, that may be a split-brain and the first step must be performed again:

    for OVNMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
         do echo "········································" ; \
         echo "· OVNKube Master: $OVNMASTER ·" ; \
         echo "········································" ; \
         echo 'North' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound | grep ^Role` ; \
         echo 'South' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound | grep ^Role`; \
         echo "····················"; \
         done
    
  • The output of the command above should look like this

    ········································
    · OVNKube Master: ovnkube-master-xxx·
    ········································
    North Role: leader
    South Role: leader
    ····················
    ········································
    · OVNKube Master: ovnkube-master-yyy
    ········································
    North Role: follower
    South Role: follower
    ····················
    ········································
    · OVNKube Master: ovnkube-master-zzz
    ········································
    North Role: follower
    South Role: follower
    ····················
    
  • Restart OVS services on masters. It can be done via oc debug node/${NODE}or via SSH:

    • If done via oc debug node/${NODE}, run this:
    $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o name --no-headers); \
      do echo "Restarting OVS services on $MASTER" ; \
      oc debug $MASTER -- chroot /host /bin/bash -c  'systemctl restart ovs-vswitchd ovsdb-server' ; sleep 3; \
      done
    
    • If done via ssh, run this instead:
    $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Restarting OVS services on $MASTER" ; \
      ssh core@$MASTER 'sudo systemctl restart ovs-vswitchd ovsdb-server' ; sleep 3; \
      done
    
  • Delete 'ovnkube-node' on masters and validate health

    $ for OVNNODEMASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting OVNKube-Node on master $OVNNODEMASTER" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector spec.nodeName=$OVNNODEMASTER -o name) ; sleep 4; \
      done
    
    $ oc -n openshift-ovn-kubernetes get pods -o wide
    
  • Restart OVS services on non-master nodes. It can be done via oc debug node/${NODE}or via SSH

    • If done via oc debug node/${NODE}, run this:
    $ for OVNKUBENODE in $(oc get nodes -l '!node-role.kubernetes.io/master' -o name --no-headers); \
      do echo "Restarting OVS services on node $OVNKUBENODE" ; \
      oc debug $OVNKUBENODE -- chroot /host /bin/bash -c 'systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
      done
    
    • If done via ssh, run this instead:
    $ for OVNKUBENODE in $(oc get nodes -l '!node-role.kubernetes.io/master'  -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Restarting OVS services on node $OVNKUBENODE" ; \
      ssh core@$OVNKUBENODE 'sudo systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
      done
    
  • Delete ovnkube-node on non-master nodes and validate health

    $ for OVNKUBENODE in $(oc get nodes -l '!node-role.kubernetes.io/master' -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting OVNKube-Node on node $OVNKUBENODE" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector spec.nodeName=$OVNKUBENODE -o name) ; sleep 4; \
      done
    
    $ oc -n openshift-ovn-kubernetes get pods -o wide
    

Rebuild OVN on RHOCP v4.8 - v4.13

  • Confirm that a new revision of the kube-apiserver is not in the process of being rolled out.

    $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
    
  • (Optional) If using oc debug pre-pulling the debug image will make this whole process faster.

    $ for NODE in $(oc get nodes -o name --no-headers); \
    do echo "Debug pod on node $NODE" ; \
    oc debug $NODE -- chroot /host /bin/bash -c 'echo hello' ; sleep 2; \
    done
    
  • The databases are mounted on container northd in the /etc/openvswitch directory and on containers nbdb and sbdb in /etc/ovn directory, which both are hostPath mounts on /var/lib/ovn/etc on the masters. On the newer versions we will delete directly on the masters to ensure databases are recreated the same way on all masters, by accessing directly via ssh or with oc debug.

  • Remove the northbound and southbound databases. This can be done either by using oc debug node/${NODE} or by using SSH:

    • If done using oc debug node/${NODE}, run these commands

      $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o name --no-headers); \
        do echo "Deleting databases on master $MASTER" ; \
        oc debug $MASTER -- chroot /host /bin/bash -c  'rm -f /var/lib/ovn/etc/*.db' ; sleep 3; \
        done
      
    • If done via SSH, run these commands instead

      $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
        do echo "Deleting databases on master $MASTER" ; \
        ssh core@$MASTER 'sudo rm -f /var/lib/ovn/etc/*.db' ; sleep 3; \
        done
      
  • Then delete the pods to force the restart:

    $ oc -n openshift-ovn-kubernetes delete pod -l=app=ovnkube-master
    
  • Validate the health by confirming there's a northbound and southbound Leader. This can take a few minutes to recover. If there's multiple leaders for either NB or SB, that may be a split-brain and the first step must be performed again:

    for OVNMASTER in $(oc -n openshift-ovn-kubernetes get pods -l app=ovnkube-master -o custom-columns=NAME:.metadata.name --no-headers); \
         do echo "········································" ; \
         echo "· OVNKube Master: $OVNMASTER ·" ; \
         echo "········································" ; \
         echo 'North' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound | grep Role` ; \
         echo 'South' `oc -n openshift-ovn-kubernetes rsh -Tc northd $OVNMASTER ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound | grep Role`; \
         echo "····················"; \
         done
    

    The output of the command above should look like this

    ········································
    · OVNKube Master: ovnkube-master-xxx
    ········································
    North Role: leader
    South Role: leader
    ····················
    ········································
    · OVNKube Master: ovnkube-master-yyy
    ········································
    North Role: follower
    South Role: follower
    ····················
    ········································
    · OVNKube Master: ovnkube-master-zzz
    ········································
    North Role: follower
    South Role: follower
    ····················
    
  • Optional: use network-tools in order to see leaders. See https://github.com/openshift/network-tools/blob/master/docs/user.md#examples

        $ oc adm must-gather --image=quay.io/openshift/origin-network-tools:latest -- network-tools ovn-get leaders
        $ network-tools ovn-get leaders
    
  • Restart OVS services on masters. It can be done either via oc debug node/${NODE} or via SSH:

    • If done via oc debug node/${NODE}, run this step

      $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o name --no-headers); \
       do echo "Restarting OVS services on node $MASTER" ; \
       oc debug $MASTER -- chroot /host /bin/bash -c  'systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
       done
      
      • If done via SSH, run this step instead

        $ for MASTER in $(oc get nodes -l node-role.kubernetes.io/master=  -o custom-columns=NAME:.metadata.name --no-headers); \
         do echo "Restarting OVS services on node $MASTER" ; \
         ssh core@$MASTER 'sudo systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
         done
        
  • Delete ovnkube-node on masters and validate health:

    $ for OVNNODEMASTER in $(oc get nodes -l node-role.kubernetes.io/master= -o custom-columns=NAME:.metadata.name --no-headers); \
      do echo "Deleting OVN-Kube Node on master $OVNNODEMASTER" ; \
      oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector spec.nodeName=$OVNNODEMASTER -o name) ; sleep 4; \
      done
    
    $ oc -n openshift-ovn-kubernetes get pods -o wide
    
  • Restart OVS services on non-master nodes. It can be done either via oc debug node/${NODE} or via SSH:

    • If done via oc debug node/${NODE}, run this step

      $ for NODE in $(oc get nodes -l '!node-role.kubernetes.io/master' -o name --no-headers); \
       do echo "Restarting OVS services on node $NODE" ; \
       oc debug $NODE -- chroot /host /bin/bash -c 'systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
       done
      
      • If done via SSH, run this step instead

        $ for NODE in $(oc get nodes -l '!node-role.kubernetes.io/master'  -o custom-columns=NAME:.metadata.name --no-headers); \
         do echo "Restarting OVS services on node $NODE" ; \
         ssh core@$NODE 'sudo systemctl restart ovs-vswitchd ovsdb-server' ; sleep 2; \
         done
        
  • Delete ovnkube-node on non-master nodes and validate health:

     $ for OVNKUBENODE in $(oc get nodes -l '!node-role.kubernetes.io/master' -o custom-columns=NAME:.metadata.name --no-headers); \
       do echo "Deleting OVN-Kube Node on node $OVNKUBENODE" ; \
       oc -n openshift-ovn-kubernetes delete --wait=false $(oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector spec.nodeName=$OVNKUBENODE -o name) ; sleep 3; \
       done
    
     $ oc -n openshift-ovn-kubernetes get pods -o wide 
    

Rebuild OVN on OCP 4.14 and later

The procedure below explains how to rebuild OVN databases in a single node. If you need to rebuild the OVN deployments of more than one node, just repeat the procedure for each node.

Steps are:

  • Confirm that a new revision of the kube-apiserver is not in the process of being rolled out.

    $ oc get kubeapiserver -o=jsonpath='{range .items[0].status.conditions[?(@.type=="NodeInstallerProgressing")]}{.reason}{"\n"}{.message}{"\n"}'
    
  • The databases are stored in /var/lib/ovn-ic/etc path in the host, which is intentionally different from the one in previous versions. This path is mounted as a hostPath mount inside the ovnkube-controller, nbdb, ovn-northd, sbdb and ovn-controller nodes.

  • Remove the northbound and southbound databases from the node. This can be done either by using oc debug node/${NODE} or by using SSH:

    • If done using oc debug node/${NODE}, run these commands
    $ oc debug node/${NODE} -- chroot /host /bin/bash -c  'rm -f /var/lib/ovn-ic/etc/ovn*.db'
    
    • If done via SSH, run these commands instead
    $ ssh core@${NODE} 'sudo rm -f /var/lib/ovn-ic/etc/ovn*.db'
    
  • Restart OVS services on the node. It can be done either via oc debug node/${NODE} or via SSH:

    • If done using oc debug node/${NODE}, run these commands
    $ oc debug node/${NODE} -- chroot /host /bin/bash -c  'systemctl restart ovs-vswitchd ovsdb-server'
    
    • If done via SSH, run these commands instead
    $ ssh core@${NODE} 'sudo systemctl restart ovs-vswitchd ovsdb-server'
    
    • Optional: In telco or high performance clusters if valid performanceProfile CR exists, then cpuset-configure.service needs to be restarted otherwise ovs dynamic CPU pinning feature will stop working after ovs-vswitchd is restarted
    $ ssh core@${NODE} 'sudo systemctl restart cpuset-configure.service'
    
  • Delete the ovnkube-controller pod on the node to force the restart

    $ oc -n openshift-ovn-kubernetes delete pod -l app=ovnkube-node --field-selector=spec.nodeName=${NODE}
    
  • Watch the ovnkube-controller pod on the node to check its health

    $ oc -n openshift-ovn-kubernetes get pod -l app=ovnkube-node --field-selector=spec.nodeName=${NODE} -w
    

    Wait until all the containers are ready, which can take several minutes.

Category
Components
Article Type