Ignition fails adding new nodes to UPI cluster after upgrading to OCP 4.6+
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
- Adding new nodes to an OpenShift cluster that has been upgraded to 4.6 fails with the follow
ignitionerror:
Content from github.com is not included.https://github.com/coreos/ignition/releases
Displaying logs from failed units: ignition-fetch-offline.service
-- Logs begin at Thu 2020-12-10 20:52:42 UTC, end at Thu 2020-12-10 20:52:45 UTC. --
Dec 10 20:52:44 ignition[649]: no config URL provided
Dec 10 20:52:44 ignition[649]: reading system config file "/usr/lib/ignition/user.ign"
Dec 10 20:52:44 systemd[1]: ignition-fetch-offline.service: Main process exited, code=exited, status=1/FAILURE
Dec 10 20:52:44 ignition[649]: parsing config with SHA512: 85f42e0875f36f9c73b858ecfdaa21fe33df0b6141c302c2b6a17d8fc322346d98cabff5f2b902e92c92325d27b51cb086df6f59e283f0586cccad313df08326
Dec 10 20:52:44 systemd[1]: ignition-fetch-offline.service: Failed with result 'exit-code'.
Dec 10 20:52:44 ignition[649]: failed to fetch config: unsupported config version
Dec 10 20:52:44 systemd[1]: Failed to start Ignition (fetch-offline).
Dec 10 20:52:44 ignition[649]: failed to acquire config: unsupported config version
Dec 10 20:52:44 ignition[649]: Ignition failed: unsupported config version
Resolution
Starting with OpenShift 4.17, it is possible to add compute nodes to on-premise clusters by generating an ISO image with oc CLI, regardless the cluster installation method for specific platforms. Refer to adding worker nodes to an on-premise cluster for additional information and the supported platforms (the supported platforms could vary depending on the version of the OpenShift cluster). For customizations refer to cluster configuration reference.
NOTES
- The following instructions are only applicable if you originally installed a cluster previous to version 4.6.
- If you have modified your “pointer” Ignition config (
worker.ign), you may not be able to cleanly migrate your config from Ignitionspec v2to Ignitionspec v3.- Clusters installed via any Installer-provisioned Infrastructure (IPI) methods are not affected by this issue if the same
RHCOS or Amazon AMIversion(which was used during the cluster installation) is used to scale up/add a new node. Nodes in these types of clusters are managed via aMachineSet, and the process of adding a new node is handled automatically by scaling up an existingMachineSet. For more information please see the instructions for manually scaling a MachineSet at https://docs.openshift.com/container-platform/4.6/machine_management/manually-scaling-machineset.html.- Cluster installed via User-provisioned Infrastructure (UPI), with
Machinesetsconfigured like in KCS 5307621: How to create a MachineSet for VMware in OCP 4 UPI installations are also not affected by this issue for the same reason than previous bullet.
AWS IPI Workflow if changing the AMI with a new
Machinesets
-
Download the
openshift-installprogram for the specific cluster version from the following site.
Content from mirror.openshift.com is not included.https://mirror.openshift.com/pub/openshift-v4/clients/ocp -
Execute the following command to retrieve the AMI information that can be used for the new
Machinesetsfrom theRHCOSmetadata.$ ./openshift-install coreos print-stream-json | jq .architectures.x86_64.images.aws -
Take a backup of the
worker-user-dataand modify it to use the ignition version 3.x and other values as described in the Example Ignition config file converted from spec version 2.2.0 to 3.1.0 section in this article.$ oc -n openshift-machine-api get secrets worker-user-data -o yaml > new-machineset-user-data.yaml -
create the new secret for the new
Machinesets.$ oc create -f new-machineset-user-data.yaml -
Create the new
Machinesetsand point theuserDataSecretto the newuser-datasecret and then test the functionality.spec: metadata: {} providerSpec: value: ami: id: ami-0b0b4e794axxxxxx [...] userDataSecret: name: new-machineset-user-data <<<<<<<<<<<<
Bare Metal UPI Workflow
- Retrieve the version of the boot media that matches the version your cluster was upgraded to.
- Make sure you have the Ignition configuration file that was used originally to install your cluster.
- Modify your Ignition configuration file to be spec v3 compatible
- At a minimum, this means change the value of the
versionfield from “2.2.0” to “3.1.0” and changing theappenddirective tomerge. Please see the example provided at the end of the Resolution section. - For more information on the Ignition specification v3 please see Content from coreos.github.io is not included.Content from coreos.github.io is not included.https://coreos.github.io/ignition/configuration-v3_1/
- Configure your provisioning system (BMC, PXE, etc) to serve up the newer versions of the boot media and the updated Ignition config file.
- Boot a new system and confirm that the version of RHCOS installed matches the version of the new boot media used.
- Follow the rest of the instructions for adding compute machines to the cluster for your version.
e.g. https://docs.openshift.com/container-platform/4.6/machine_management/user_infra/adding-bare-metal-compute-user-infra.html
vSphere UPI Workflow
- Retrieve the version of the RHCOS OVA image that matches the version your cluster was installed to.
- Make sure you have the Ignition configuration file that was used originally to install your cluster.
- Modify your Ignition configuration file to be spec v3 compatible
- At a minimum, this means change the value of the
versionfield from “2.2.0” to “3.1.0” and changing theappenddirective tomerge. Please see the example provided at the end of the Resolution section. - For more information on the Ignition specification v3 please see Content from coreos.github.io is not included.Content from coreos.github.io is not included.https://coreos.github.io/ignition/configuration-v3_1/
- In your vSphere cluster, configure a new template using the OVA image (the version of the OVA must match the installation version) and the updated Ignition configuration file as described in This page is not included, but the link has been rewritten to point to the nearest parent document.installing-vsphere documentation.
- Follow the rest of the instructions for adding compute machines to the cluster for your version.
e.g. https://docs.openshift.com/container-platform/4.6/machine_management/user_infra/adding-vsphere-compute-user-infra.html
Additional info and examples
Example Ignition config file converted from spec version 2.2.0 to 3.1.0
{
"ignition": {
"config": {
"append": [
{
"source": "https://api-int.mycluster.example.com:22623/config/worker",
"verification": {}
}
]
},
"security": {
"tls": {
"certificateAuthorities": [
{
"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CR....",
"verification": {}
}
]
}
},
"timeouts": {},
"version": "2.2.0"
},
"networkd": {},
"passwd": {},
"storage": {},
"systemd": {}
}
{
"ignition": {
"config": {
"merge": [
{
"source": "https://api-int.mycluster.example.com:22623/config/worker",
"verification": {}
}
]
},
"security": {
"tls": {
"certificateAuthorities": [
{
"source": "data:text/plain;charset=utf-8;base64,LS0tLS1CR....",
"verification": {}
}
]
}
},
"timeouts": {},
"version": "3.1.0"
},
"networkd": {},
"passwd": {},
"storage": {},
"systemd": {}
}
The worker ignition can be recreated using the below steps:
-
Export
api-internalhostname to a variable:$ export CLUSTERDOMAINAPI=api-int.clusterDomain:22623 -
Run the below script:
#!/bin/sh mkdir -p /tmp/scaleup; cd /tmp/scaleup cat <<EOF >>worker.ign {"ignition":{"config":{"merge":[{"source":"https://CLUSTERDOMAINAPI/config/worker"}]},"security":{"tls":{"certificateAuthorities":[{"source":"data:text/plain;charset=utf-8;base64,CERTinBASE64"}]}},"version":"3.1.0"}} EOF sed -i "s%CLUSTERDOMAINAPI%$CLUSTERDOMAINAPI%" worker.ign echo "q" | openssl s_client -connect $CLUSTERDOMAINAPI -showcerts | awk '/-----BEGIN CERTIFICATE-----/,/-----END CERTIFICATE-----/' | base64 --wrap=0 | tee ./api-int.base64 sed --regexp-extended --in-place='' "s%CERTinBASE64%$(cat ./api-int.base64)%" worker.ign -
Or eventually just executing the following command:
$ oc get -n openshift-machine-api secret worker-user-data -o jsonpath='{.data.userData}'|base64 -d > worker.ignNote: the file still needs to be edited by changing the version and replacing "append" with "merge". If the cluster was installed in a previous release than the current.
Root Cause
Red Hat OpenShift Container Platform 4.6 introduces a new version of the Ignition specification (v3) that is incompatible with the previous version (v2). Currently, the documentation for adding new nodes to a cluster instructs users to use the version of the boot media that was used to originally install their cluster. For example, if a cluster was originally installed at version 4.3 and was upgraded to 4.5, users would need the 4.3 version of boot media to add additional nodes to the cluster. This process of adding additional nodes to the cluster is complicated by the switch to Ignition spec v3, as the 4.6+ RHCOS images do not support Ignition spec v2. UPI users that have installed on bare metal or vSphere and have upgraded to 4.6 will run into this incompatibility issue when using the updated boot media to deploy new nodes.
For IPI users, updating the boot media is currently not possible. Users that scale up their nodes via a MachineSet using the Machine API Operator will have nodes booted using the original version of the boot media. However, the Machine Config Operator is able to perform the necessary translation from Ignition spec v3 to Ignition spec v2.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.