OpenShift on OpenStack with compute Availability Zones: Missing rootVolume availability zone
Environment
- OpenShift on OpenStack IPI
- Version of the initial cluster deployment is inferior to 4.14
- Masters or workers deployed with explicitly given Availability Zones (via
zones) ininstall-config.yaml - Master or workers deployed with Root Volumes (via
rootVolume) without explicitly given - Availability Zones (viarootVolume.zones) ininstall-config.yaml - Cinder availability zone(s) don’t match Nova availability zone(s)
Issue
- When the Machine Controller will create a new machine, the root volume availability zone will be taken from the compute availability zone, which doesn’t exist in Cinder. The machine will never be created.
Resolution
In the bug resolution, the Installer v4.14+ validates that root volume availability zone(s) is or are provided when there is (are) compute availability zone(s) in install-config.yaml and prevents the deployment to continue if that’s not the case.
For clusters that were deployed before the 4.14 with the IPI method and with masters or workers deployed on multiple availability zones with root volumes without availability zones, you need to manually update the Machine resources so that they reflect the actual state of the instances.
Note: editing the Machine resources will not trigger a rollout of the Control plane instances, because in-place edits of the Machine resources are not acted upon by any OpenShift operator. However, this in-place edit is necessary in order for the cluster-control-plane-machine-set-operator to correctly generate a ControlPlaneMachineSet for your cluster.
To do that, edit the ProviderSpec for all the machines matching the environment, and set the property rootVolume.availabilityZone of spec.providerSpec to the value of the actual volume availability zone (example with master-0):
oc edit machine/<cluster_id>-master-0 -n openshift-machine-api
<make edits>
Here is an example of output:
providerSpec:
value:
apiVersion: machine.openshift.io/v1alpha1
availabilityZone: az0
cloudName: openstack
cloudsSecret:
name: openstack-cloud-credentials
namespace: openshift-machine-api
flavor: m1.xlarge
image: rhcos-4.14
kind: OpenstackProviderSpec
metadata:
creationTimestamp: null
networks:
- filter: {}
subnets:
- filter:
name: refarch-lv7q9-nodes
tags: openshiftClusterID=refarch-lv7q9
rootVolume:
availabilityZone: nova <--- Add the zone name here
diskSize: 30
sourceUUID: rhcos-4.12
volumeType: fast-0
securityGroups:
- filter: {}
name: refarch-lv7q9-master
serverGroupName: refarch-lv7q9-master
serverMetadata:
Name: refarch-lv7q9-master
openshiftClusterID: refarch-lv7q9
tags:
- openshiftClusterID=refarch-lv7q9
trunk: true
userDataSecret:
name: master-user-data
In case you edited or recreated your Machine resources after install, you will have to adapt these steps to your situation. In your OpenStack cluster, find the availability zone of the root volumes for your machines and set it in the availabilityZone property of all the machines.
Note on control plane machines: once all machines have a root volume availability zone, your control plane is ready to be managed by the Content from github.com is not included.Cluster Control Plane Machine Set operator and a ControlPlaneMachineSet (CPMS) will be created.
It'll be up to the user to review the generated CPMS and edit its state to Active when ready:
oc describe controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api
oc edit controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api
Root Cause
If the masters or workers are configured with Availability Zones (AZ) and root volumes without AZs, the installer (via Terraform) will create the volumes with no AZ (and the default one will be picked up in Cinder, (e.g. “nova” if no AZ other AZ exists in Cinder).
The Machines ProviderSpec will be created with empty rootVolume.availabilityZone.
For example: given an install-config.yaml with three zones in the ControlPlane machine-pool, and a rootVolume without zones, the Installer creates all the volumes in the default availability zone available in Cinder, and at the same time generates each Machine resource without rootVolume.availabilityZone.
This anomaly was reported as This content is not included.OCPBUGS-15997.
Diagnostic Steps
To check whether the masters or workers are missing the root volume availability zone, run:
oc get -n openshift-machine-api machine -o json | jq -r '.items[] | select(.spec.providerSpec.value.availabilityZone and (.spec.providerSpec.value.rootVolume.availabilityZone | not)) | .metadata.name'
It'll list the machines with need to be edited.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.