Recovering from clusters that automatically updated to v4.18 nfd despite running older OCP

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4

  • node feature discovery(NFD) operator

Issue

When, by mistake, a 4.18 catalog is pushed to an oloder version, and cluster has enabled automatic OLM-installed operator updates, the node feature discovery(NFD) operator is upgraded to 4.18 version, which might not be compatible with the cluster. The catalogs have been corrected, but the NFD operator needs to be downgraded.

This article covers the specific remediation for the NFD operator. For the general issue affecting multiple operators, refer to Red Hat Operator has version higher than the cluster version.

Resolution

The way to downgrade is as following:

  1. Save the existing NFD CR to a .yaml file. The CR is located in the openshift-nfd namespace.
  2. Delete NFD CR. Can be removed using oc utility.
  3. Check if NFD Rule exists in the cluster. If any exists, save it to a .yaml file.
  4. Delete NFD Rule(s), if exist.
  5. Uninstall NFD operator using OLM.
  6. Verify that there are no instances of NodeFeature left in the cluster. If there are: delete them using oc utility.
  7. Make sure to remove all NFD related CRDs (which might not be removed during operator un-install by OLM). The CRDs are: nodefeaturediscovery, nodefeature, nodefeaturerule.
  8. Re-install NFD operator using OLM, making sure that the correct version is installed.
  9. Re-apply NodeFeatureDiscovery CR and NodeFeatureRule, that you have saved in the steps 1 and 3.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.