Ignition fails adding new nodes to UPI cluster after upgrading to OCP 4.6+

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • Adding new nodes to an OpenShift cluster that has been upgraded to 4.6 fails with the follow ignition error:

Content from github.com is not included.https://github.com/coreos/ignition/releases

Displaying logs from failed units: ignition-fetch-offline.service
-- Logs begin at Thu 2020-12-10 20:52:42 UTC, end at Thu 2020-12-10 20:52:45 UTC. --
Dec 10 20:52:44 ignition[649]: no config URL provided
Dec 10 20:52:44 ignition[649]: reading system config file "/usr/lib/ignition/user.ign"
Dec 10 20:52:44 systemd[1]: ignition-fetch-offline.service: Main process exited, code=exited, status=1/FAILURE
Dec 10 20:52:44 ignition[649]: parsing config with SHA512: 85f42e0875f36f9c73b858ecfdaa21fe33df0b6141c302c2b6a17d8fc322346d98cabff5f2b902e92c92325d27b51cb086df6f59e283f0586cccad313df08326
Dec 10 20:52:44 systemd[1]: ignition-fetch-offline.service: Failed with result 'exit-code'.
Dec 10 20:52:44 ignition[649]: failed to fetch config: unsupported config version
Dec 10 20:52:44 systemd[1]: Failed to start Ignition (fetch-offline).
Dec 10 20:52:44 ignition[649]: failed to acquire config: unsupported config version
Dec 10 20:52:44 ignition[649]: Ignition failed: unsupported config version

Resolution

Starting with OpenShift 4.17, it is possible to add compute nodes to on-premise clusters by generating an ISO image with oc CLI, regardless the cluster installation method for specific platforms. Refer to adding worker nodes to an on-premise cluster for additional information and the supported platforms (the supported platforms could vary depending on the version of the OpenShift cluster). For customizations refer to cluster configuration reference.

NOTES

  • The following instructions are only applicable if you originally installed a cluster previous to version 4.6.
  • If you have modified your “pointer” Ignition config (worker.ign), you may not be able to cleanly migrate your config from Ignition spec v2 to Ignition spec v3.
  • Clusters installed via any Installer-provisioned Infrastructure (IPI) methods are not affected by this issue if the same RHCOS or Amazon AMI version(which was used during the cluster installation) is used to scale up/add a new node. Nodes in these types of clusters are managed via a MachineSet, and the process of adding a new node is handled automatically by scaling up an existing MachineSet. For more information please see the instructions for manually scaling a MachineSet at https://docs.openshift.com/container-platform/4.6/machine_management/manually-scaling-machineset.html.
  • Cluster installed via User-provisioned Infrastructure (UPI), with Machinesets configured like in KCS 5307621: How to create a MachineSet for VMware in OCP 4 UPI installations are also not affected by this issue for the same reason than previous bullet.

AWS IPI Workflow if changing the AMI with a new

Machinesets

  • Download the openshift-install program for the specific cluster version from the following site.
    Content from mirror.openshift.com is not included.https://mirror.openshift.com/pub/openshift-v4/clients/ocp

  • Execute the following command to retrieve the AMI information that can be used for the new Machinesets from the RHCOS metadata.

    $ ./openshift-install coreos print-stream-json | jq .architectures.x86_64.images.aws
    
  • Take a backup of the worker-user-data and modify it to use the ignition version 3.x and other values as described in the Example Ignition config file converted from spec version 2.2.0 to 3.1.0 section in this article.

    $ oc -n openshift-machine-api get secrets worker-user-data -o yaml > new-machineset-user-data.yaml
    
  • create the new secret for the new Machinesets.

    $ oc create -f new-machineset-user-data.yaml
    
  • Create the new Machinesets and point the userDataSecret to the new user-data secret and then test the functionality.

        spec:
          metadata: {}
          providerSpec:
            value:
              ami:
                id: ami-0b0b4e794axxxxxx
        [...]
              userDataSecret:
                name: new-machineset-user-data <<<<<<<<<<<< 
    

Bare Metal UPI Workflow

  1. Retrieve the version of the boot media that matches the version your cluster was upgraded to.
  2. Make sure you have the Ignition configuration file that was used originally to install your cluster.
  3. Modify your Ignition configuration file to be spec v3 compatible
  1. Configure your provisioning system (BMC, PXE, etc) to serve up the newer versions of the boot media and the updated Ignition config file.
  2. Boot a new system and confirm that the version of RHCOS installed matches the version of the new boot media used.
  3. Follow the rest of the instructions for adding compute machines to the cluster for your version.
    e.g. https://docs.openshift.com/container-platform/4.6/machine_management/user_infra/adding-bare-metal-compute-user-infra.html

vSphere UPI Workflow

  1. Retrieve the version of the RHCOS OVA image that matches the version your cluster was installed to.
  2. Make sure you have the Ignition configuration file that was used originally to install your cluster.
  3. Modify your Ignition configuration file to be spec v3 compatible
  1. In your vSphere cluster, configure a new template using the OVA image (the version of the OVA must match the installation version) and the updated Ignition configuration file as described in This page is not included, but the link has been rewritten to point to the nearest parent document.installing-vsphere documentation.
  2. Follow the rest of the instructions for adding compute machines to the cluster for your version.
    e.g. https://docs.openshift.com/container-platform/4.6/machine_management/user_infra/adding-vsphere-compute-user-infra.html

Additional info and examples

Example Ignition config file converted from spec version 2.2.0 to 3.1.0

{
  "ignition": {
    "config": {
      "append": [
        {
          "source": "https://api-int.mycluster.example.com:22623/config/worker",
          "verification": {}
        }
      ]
    },
    "security": {
      "tls": {
        "certificateAuthorities": [
          {
            "source": "data:text/plain;charset=utf-8;base64,LS0tLS1CR....",
            "verification": {}
          }
        ]
      }
    },
    "timeouts": {},
    "version": "2.2.0"
  },
  "networkd": {},
  "passwd": {},
  "storage": {},
  "systemd": {}
}
{
  "ignition": {
    "config": {
      "merge": [
        {
          "source": "https://api-int.mycluster.example.com:22623/config/worker",
          "verification": {}
        }
      ]
    },
    "security": {
      "tls": {
        "certificateAuthorities": [
          {
            "source": "data:text/plain;charset=utf-8;base64,LS0tLS1CR....",
            "verification": {}
          }
        ]
      }
    },
    "timeouts": {},
    "version": "3.1.0"
  },
  "networkd": {},
  "passwd": {},
  "storage": {},
  "systemd": {}
}

The worker ignition can be recreated using the below steps:

  • Export api-internal hostname to a variable:

    $ export CLUSTERDOMAINAPI=api-int.clusterDomain:22623
    
  • Run the below script:

      #!/bin/sh
      mkdir -p /tmp/scaleup; cd /tmp/scaleup 
      cat <<EOF >>worker.ign
      {"ignition":{"config":{"merge":[{"source":"https://CLUSTERDOMAINAPI/config/worker"}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,CERTinBASE64"}]}},"version":"3.1.0"}}
      EOF
      sed -i "s%CLUSTERDOMAINAPI%$CLUSTERDOMAINAPI%" worker.ign 
      echo "q" | openssl s_client -connect $CLUSTERDOMAINAPI -showcerts | awk '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/' | base64 --wrap=0 | tee ./api-int.base64
      sed --regexp-extended --in-place='' "s%CERTinBASE64%$(cat ./api-int.base64)%" worker.ign
    
  • Or eventually just executing the following command:

      $ oc get -n openshift-machine-api secret worker-user-data -o jsonpath='{.data.userData}'|base64 -d > worker.ign
    

    Note: the file still needs to be edited by changing the version and replacing "append" with "merge". If the cluster was installed in a previous release than the current.

Root Cause

Red Hat OpenShift Container Platform 4.6 introduces a new version of the Ignition specification (v3) that is incompatible with the previous version (v2). Currently, the documentation for adding new nodes to a cluster instructs users to use the version of the boot media that was used to originally install their cluster. For example, if a cluster was originally installed at version 4.3 and was upgraded to 4.5, users would need the 4.3 version of boot media to add additional nodes to the cluster. This process of adding additional nodes to the cluster is complicated by the switch to Ignition spec v3, as the 4.6+ RHCOS images do not support Ignition spec v2. UPI users that have installed on bare metal or vSphere and have upgraded to 4.6 will run into this incompatibility issue when using the updated boot media to deploy new nodes.

For IPI users, updating the boot media is currently not possible. Users that scale up their nodes via a MachineSet using the Machine API Operator will have nodes booted using the original version of the boot media. However, the Machine Config Operator is able to perform the necessary translation from Ignition spec v3 to Ignition spec v2.

Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.