Avoiding long SELinux relabeling times by using the SELinux Mount (Developer Preview) Feature in 4.17

Updated

Pods can take a very long time to start when the volume that they have mounted contains a large number of files. You can avoid SELinux labeling issues while keeping SELinux confining by leveraging the SELinux Mount, Developer Preview feature, available with Red Hat OpenShift Container Platform 4.17.

  • Note: The SELinux Mount feature is This content is not included.Developer Preview, meaning it is not supported for production use cases. If you need a production ready/tested solution or options please see KCS 6221251
  • Warning: Enabling/Using the SELinux Mount feature in its developer preview state, by setting the CustomNoUpgrade feature-gate, CANNOT BE UNDONE, and PREVENTS UPGRADES to future minor versions.
  1. Ensure that the CSI driver that you are using announces SELinuxMount support. Mounting with SELinux options depends on a CSI driver. These CSI drivers shipped as part of OpenShift do support mounting with SELinux options: ODF, AWS EBS, Azure Disk, GCP PD, IBM VPC Block, Cinder, and vSphere. For third-party drivers, please contact your storage vendor.
  2. Enable SELinuxMount feature gate in the cluster. The following command sets featureSet: CustomNoUpgrade, with feature gate SELinuxMount enabled in an OpenShift cluster.
$ oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet":"CustomNoUpgrade", "customNoUpgrade":{"enabled":["SELinuxMount"]}}}'
  • Enabling the CustomNoUpgrade feature set on your cluster cannot be undone and prevents minor version updates. You should not enable this feature set on production clusters.
  • As a result of the command, all nodes in the cluster will be drained and restarted with the new feature gate. This may take some time.
  1. Verify it is enabled on all nodes. After all nodes are drained + restarted, every kubelet should log SELinuxMount:true feature gate in its journal:
$ oc adm node-logs <node name> | grep "feature gates"
...
Sep 09 15:59:56.738891 ip-10-0-29-242 kubenswrapper[2460]: I0909 15:59:56.733630    2460 feature_gate.go:255] feature gates: {map[CloudDualStackNodeIPs:true DisableKubeletCloudCredentialProviders:true  DynamicResourceAllocation:false EventedPLEG:false KMSv1:true MaxUnavailableStatefulSet:false NodeSwap:false ProcMountType:false RouteExternalCertificate:false SELinuxMount:true ServiceAccountTokenNodeBinding:false TranslateStreamCloseWebsocketRequests:false UserNamespacesSupport:false ValidatingAdmissionPolicy:true]}

From now on, all volumes that satisfy all these conditions will have their SELinux label applied in constant time when mounting the volume for a Pod:

  1. The Pod has SELinux label assigned. Please check spec.containers[*].securityContext and spec.securityContext of the Pod. OpenShift automatically assigns SELinuxLabel to all pods that are controlled by these SCCs: anyuid, hostaccess, hostmount-anyuid, hostnetwork, hostnetwork-v2, machine-api-termination-handler, nonroot, nonroot-v2, restricted, and restricted-v2.
  2. The CSI driver responsible for the volume supports the SELinuxMount feature. See above for a list of CSI drivers that do support it.

When any of these conditions is not met, OpenShift will perform the recursive relabeling of all files and directories of the whole volume when starting a Pod.

Compatibility considerations

With this feature enabled, all Pods that run in parallel and use the same PersistentVolume must have the same SELinux label. This happens automatically for all Pods that run in the same namespace and get their SELinux label assigned by their SCC.

For privileged pods and/or Pods that run in different namespaces, it's the responsibility of the Pod (Deployment, StatefulSet) author to ensure that all pods have the same SELinux label.

If a newly created Pod uses a different SELinux label than an already running Pod that uses the same volume, the Pod won't start and following event will be reported:

Unable to attach or mount volumes: unmounted volumes=[vol], unattached volumes=[], failed to process volumes=[vol]: conflicting SELinux labels of volume pvc-e4ee375a-028d-4898-a605-7b05a07f4de6: "system_u:object_r:container_file_t:s0:c0,c1" and "system_u:object_r:container_file_t:s0:c150,c431"
  • Note: The SELinux label and corresponding MLS values for your error, as well as your PV/PVC values may differ from the message shown above. In the example, c0,c1 is the SELinux label of Pod(s) that are using the volume now, c150,c431 is the label of the Pod that cannot start.

Troubleshooting

Metrics

To get a better overview about how many Pods were not able to start, check This page is not included, but the link has been rewritten to point to the nearest parent document.volume_manager_selinux_volume_context_mismatch_errors_total metric. It shows how many Pods cannot start, because other Pods use the same volume on the same node with a different SELinux label.

  • Note: that kubelet re-tries to start the Pod periodically, so this metric can grow even when no new Pods are created.

More metrics may be available in a future release of OpenShift, esp. helping to isolate what pods are involved.

Components
Article Type