Disable the machine-config-operator autoreboot after a change in OCP 4

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform (RHOCP)
    • 4.x

Issue

  • Avoid Openshift Container Platform 4.x automatically applying MachineConfig changes which require reboots.
  • Because of the autoreboot is started when the configuration has been changed by the Administrator. If the autoreboot which is unwanted at that moment, it can be paused by the Administrator.
  • Is it possible to pause MachineConfigPool to apply multiple MachineConfigs at one time?

Resolution

Configuration changes to OpenShift hosts such as creating a new machine config for modifying the NTP configuration, or modifying the registry mirror(imagecontentsourcepolicy) , user-ca-bundle (configmap), pull-secret (secret), registries, SSH keys, and kubelet configuration, etc., may be paused in order to prevent rolling reboots by setting the spec.paused field in the machineconfigpool to true:

The paused field specifies whether or not changes to this machine config pool should be stopped, this includes generating a new desiredMachineConfig and update of machines.

Pause the autoreboot via modifying machineconfigpool

WARNING: Remember that by pausing the MCP, ALL config changes are paused. This prevents several changes required for the correct operation of the cluster, including RHCOS updates, kernel arguments, file changes, proxy changes, crio/kubelet configuration, node/kubelet certificate rotations, etc. etc. from reaching any of the nodes. Doing so can lead to potential problems if other components assume that those config changes are already in place but they are not due to MCP being paused. In addition, it can cause failures in multiple oc commands, including but not limited to oc debug, oc logs, oc exec, and oc attach. In such cases, it is the user's responsibility to deal with that and un-pause accordingly.

For additional information, refer to This content is not included.The Consequences Of Pausing MachineConfig Pools In OpenShift's Machine Config Operator (MCO).

The Administrator can pause the auto-reboot which is unwanted at that moment, modify spec.paused field to true.

For master

# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master

For worker

# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/worker

Checking the paused status

Get to know whether the machine config pool is paused or unpaused, use oc get machineconfigpool or oc get mcp to check the spec.paused field is true or false.

For master

# oc get machineconfigpool/master --template='{{.spec.paused}}'

For worker

# oc get machineconfigpool/worker --template='{{.spec.paused}}'

How to know when machine config pool must be unpaused

MCO usually expects to be able to apply host level changes to the nodes, which in turn requires them to be rebooted. When spec.paused is true, it means that the changes are paused, i.e. not applied but just queued. If there are queued changes, it is recommended to schedule a maintenance window for a reboot as early as possible by turning spec.pausedto false, so that the queued changes since last reboot will take effect. Failing to do this may result in different issues, the higher changes stay not applied the bigger the risk of issues. Once changes are applied, MCP can be paused again.

Best way to check for pending changes is to run oc get mcp -o wide. With OCP 4.6, if there are pending changes, UPDATED column will be false and UPDATING column will be true even when no update is actually in progress due to pause. With OCP 4.7 and later, if there are pending changes,UPDATED column will be false and UPDATING column will remain false as well, and follow check will be useful.

Another way to check is to see if the rendered config at MCP spec (the desired one) is different than the one at status (the current one). That can be checked like this:

$ oc get mcp -o custom-columns=NAME:metadata.name,DESIRED:spec.configuration.name,CONFIG-STATUS:status.configuration.name
NAME    DESIRED                                           CURRENT
master  rendered-master-a373c027745838e7a7cd341a2c522d60  rendered-master-a373c027745838e7a7cd341a2c522d60
worker  rendered-worker-17ca378c0ac4c57866cf65d0ee92d549  rendered-worker-535cfd38878432294695598224d899e3

In this example, we can see that the desired spec for worker MCP is different from the current one, i.e. there are changes pending for worker MCP, while the master has curent config equal to desired one, i.e. no pending changes.

Unpause the autoreboot via modifying machineconfigpool

For master

# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master

For worker

# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/worker

Root Cause

OpenShift 4 is an operator-focused platform, and the Machine Config operator extends that to the operating system itself, managing updates and configuration changes to essentially everything between the kernel and kubelet.

To repeat for emphasis, this operator manages updates to systemd, cri-o/kubelet, kernel, NetworkManager, etc. It also offers a new MachineConfig CRD that can write configuration files onto the host.

For details, please refer to the Machine Config Operator Documentation.
Understanding the Machine Config Operator

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.