Resolving overlapping UID ranges in OpenShift namespaces after migration
When a Namespace is created in OpenShift, it is assigned a unique User Id (UID) range, a Supplemental Group (GID) range, and unique SELinux MCS labels. This information is stored in the metadata.annotations field of the Namespace. Everytime a new Namespace is created, OpenShift assigns it a new range from its available pool of UIDs and updates the metadata.annotations field to reflect the assigned values.
However, if the Namespace resource already has those annotations set, OpenShift does not re-assign new values for the Namespace. It instead assumes that the existing values are valid and moves on.
This can be an issue for customer using the OpenShift Migration Toolkit for Containers (MTC) as it creates the potential for ID's to overlap with another namespace's range. In most cases, the colliding ranges do not pose a security risk/threat as there are other ways to configure workloads in OpenShift that provide sufficient isolation (which happen by default). In some cases, the duplicate ranges may be a problem for users due to various reasons, the primary of which is compliance requirements with many industry standards. This this article aims to help demonstrate how you an resolve/remediate the overlapping ranges.
How MTC and UID Ranges Collide
When the Migration Toolkit for Containers (MTC) offering migrates the contents of Namespaces (including the Namespace itself) and the PersistentVolume data from the source cluster to a target cluster. MTC takes the stance to not mutate the object definition *(YAML) of the Namespace resource[s]. Therefore, definitions of migrated Namespaces in the target cluster are same as that of their source counterparts. Similarly, MTC also preserves ownership and permissions of the persistent data. This ensure that post migration applications still have access to the data/files needed to run an function.
MTC intentionally preserves these annotations (as stated above). If it were not to preserve them, OpenShift will assign new UID ranges to the restored Namespace, and the migrated data will be inaccessible to the workloads restored in the target cluster since they will start with a different UID than the source.
Thus it is possible for the ranges of the restored Namespaces collide with existing Namespaces as they were not originally assigned by the same OpenShift Cluster, but instead were set by the MTC tooling.
- Note: In most cases, the colliding ranges do not pose a security risk/threat as there are other safeguards/configurations to workloads in OpenShift that provide sufficient isolation. However, in some cases, the duplicate ranges may be a problem for users due to various reasons, IE; compliance requirements.
Identifying UID range collisions
The following commands can be used to identify possible range collisions on the OpenShift cluster :
oc get namespaces -o custom-columns="UID":".metadata.annotations.openshift\.io/sa\.scc\.uid-range" | sort | uniq -cd
oc get namespaces -o custom-columns="GID":".metadata.annotations.openshift\.io/sa\.scc\.supplemental-groups" | sort | uniq -cd
oc get namespaces -o custom-columns="MCS":".metadata.annotations.openshift\.io/sa\.scc\.mcs" | sort | uniq -cd
If possible collisions are found, the commands will return the affected OpenShift namespaces.
Fixing UID ranges after migration
MTC intentionally preserves the UID ranges, thus it cannot automatically change the UID ranges of namespaces in the target cluster due to data access and data integrity concerns. In case the colliding ranges become a problem, or is meerly something you wish to correct, users are required to manually update them after a migration. In some cases, this also means updating the file ownership of persistent volume data.
Following is a step by step guide on updating the UID ranges of namespaces manually:
Step 1: Updating the UID ranges on Namespace resources
The first step is to make OpenShift assign new UID ranges from its pool. This can be done by simply deleting the existing annotations on the Namespaces. Once deleted, OpenShift will automatically re-assign new ranges such that they don’t collide with existing ones. The following annotations need to be deleted, openshift.io/sa.scc.mcs, openshift.io/sa.scc.supplemental-groups, openshift.io/sa.scc.uid-range.
- Recommendation: Keep the original values backed up so that the workloads can be rolled back in case of failures in further steps.
oc annotate namespace <namespace> openshift.io/sa.scc.mcs-
oc annotate namespace <namespace> openshift.io/sa.scc.supplemental-groups-
oc annotate namespace <namespace> openshift.io/sa.scc.uid-range-
Step 2: Quiescing apps
After Updating the Namespaces annotations new non-overlapping UID ranges assigned to them. However, the workloads will still be running with their old UIDs. The new UIDs will take effect only after the workloads are restarted. In this step, the workloads will simply be quiesced down.
Before quiescing down, the current replica values of workload resources need to stored. The value will be needed at the time of un-quiescing the workloads.
- Pro Tip: Simply store the current replica count as annotation on the workload itself.
oc annotate deploymentconfig <deploymentconfig> preQuiesceReplicas=$(oc get deploymentconfig <deploymentconfig> -o jsonpath={.spec.replicas})
Once the replica count is saved, the replicas of workload resources will be set to 0:
oc scale deploymentconfig <deploymentconfig> --replicas=0
Step 3: Updating ownership of PersistentVolume data
File ownership of the data will need to be updated manually.
- Note: that this step may or may not be required depending on the underlying storage used by the PVCs.
- When this step may not be required?
- If the storage is a block storage, on which an filesystem is overlayed, Kubernetes will automatically chown files on the volumes when the workloads are un-quiesced in Step 4. AWS EBS, GCE Persistent Disks are some examples of block storages that Kubernetes automatically updates the ownership of the data based on the UID of running workloads. One way to determine whether or not the automatic
chownis supported is by looking at the fsType attribute of the underlying PersistentVolume object. If the fsType field is set, Kubernetes will automaticallychownthe volume. - If the storage is a shared storage, and an additional list of Supplemental Groups can be provided to the workloads. The workloads can simply access the files using one of the GIDs provided in the Supplemental Groups section of the Pod’s security context.
- If the storage is a block storage, on which an filesystem is overlayed, Kubernetes will automatically chown files on the volumes when the workloads are un-quiesced in Step 4. AWS EBS, GCE Persistent Disks are some examples of block storages that Kubernetes automatically updates the ownership of the data based on the UID of running workloads. One way to determine whether or not the automatic
- When this step may not be required?
This (manual update) is done using a dummy Pod, launched in the namespace with all the PVCs attached to it. The Pod can simply run chown recursively for all the paths where the PVCs are attached. It needs to run in privileged mode. If the number of PVCs in the Namespace is large, the Pod can attach PVCs in batches. Additionally, multiple pods can run simultaneously in the same namespace to reduce the total time. The correct UID value to use can be read from the annotation of the Namespace.
For instance, consider that we have a Namespace with its assigned UID range as 1000680000/10000. It has two PVCs pvc-0 and pvc-1 in it. The migrated data in the backing PVs is owned by user beloging to a different UID range. Then a dummy Pod can mount these PVCs and run chown with correct UID:
apiVersion: v1
kind: Pod
metadata:
name: chown-files
namespace: <namespace>
spec:
containers:
- name: owner-modifier
image: rhel:latest
command:
- /bin/bash
- -c
- /usr/bin/chown -R 1000680000 /tmp/
securityContext:
privileged: true
volumeMounts:
- mountPath: /tmp/pvc-0/
name: volume-0
- mountPath: /tmp/pvc-1/
name: volume-1
volumes:
- name: volume-0
persistentVolumeClaim:
claimName: pvc-0
- name: volume-1
persistentVolumeClaim:
claimName: pvc-1
The above Pod attaches all the PVs at distinct locations under /tmp directory and runs chown recursively on all directories under /tmp to fix the ownership of data. Additionally, to update MCS labels, chcon -R -l <labels> can also be added to the Pod command.
Step 4: Bringing workloads back online
Upon successful execution of Step 3, or determination that this isn't needed, the persistent data will be owned by the correct UID belonging to the Namespace range. The original replica count of the workload simply needs to be restored from Step 2.
- Note: If you use the Pro Tip in Step 2 you can simply use the command below to achive this.
oc scale deploymentconfig <deploymentconfig> --replicas=$(oc get deploymentconfig <deploymentconfig> -o jsonpath='{.metadata.annotations.preQuiesceReplicas}')