Metrics Cannot Attach/Mount the Intended Storage

Solution Verified - Updated

Environment

  • Red Hat OpenShift Enterprise
    • 3.0
    • 3.1
    • 3.2
  • Red Hat OpenShift Container Platform
    • 3.3
    • 3.4
    • 3.5
    • 3.6
    • 3.7
    • 3.9

Issue

  • Starting Metrics backup after work on the cluster and I get an error about storage:
<DATE_TIME>   <DATE_TIME>   2         hawkular-cassandra-1-7voky   Pod                                                 Warning   FailedMount        {kubelet test.example.com}   Unable to mount volumes for pod "hawkular-cassandra-1-7voky_openshift-infra(96765ee3-8c28-11e6-916f-005056a6d599)": timeout expired waiting for volumes to attach/mount for pod "hawkular-cassandra-1-7voky"/"openshift-infra". list of unattached/unmounted volumes=[cassandra-data]
<DATE_TIME>   <DATE_TIME>   2         hawkular-cassandra-1-7voky   Pod                                                 Warning   FailedSync         {kubelet test.example.com}   Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "hawkular-cassandra-1-7voky"/"openshift-infra". list of unattached/unmounted volumes=[cassandra-data]
<DATE_TIME>   <DATE_TIME>   29        heapster-lszxv               Pod                     spec.containers{heapster}   Warning   Unhealthy          {kubelet test.example.com}   Readiness probe failed: The heapster process is not yet started, it is waiting for the Hawkular Metrics to start.
  • Updating my cluster and metrics won't start back up
  • Events indicate cassandra can't mount the volume
  • Cassandra failing with error about permissions:
ERROR 20:01:35 Has no permission to create directory /cassandra_data/data

Resolution

  • Assuming the PV/PVC information is correctly setup, and you have not created a PVC for the deployer, then the issue is likely to do with the permissions.
  • To check on these, you can look at the storage by running ls -la against the path to see what the owner and permission levels are.
    • If you take note of what they are at this point and then proceed further on, you would be able to then revert to the current settings if the steps below do not help.
The steps for working on this are as follows:

Setting Ownership

  • You will want to set the ownership of the files to match the fsGroup or other gid/uid found in the pod definition's 'securityContext' section.
    • You can collect this information by looking at the pod's yaml:
 # oc get pod cassandra-pod -o yaml
  • Find the securityContext section which will look something like:
  securityContext:
    fsGroup: 1000000000
    seLinuxOptions:
      level: s0:c1,c0
  • And then set the ownership while at the location of the new storage:
 # chcon -R system_u:object_r:svirt_sandbox_file_t:s0:c1,c0 /<storage_path>/
 # chown -R 1000000000:1000000000 /<storage_path>/

If the securityContext doesn't mention a gid/uid

  • This can happen if the pod is running in the anyuid SCC, or any time the SCC a pod is using includes "runAsUser: type: RunAsAny".
  • When a pod is created, an Admission Controller is going to set it's SCC based on either the Service Account or the Replication Controller.
  • To check on the SCCs, we need to run the following command, which will show us if any of them have been modified from the default settings:
oadm policy reconcile-sccs
  • If that command returns any SCCs, we should check specifically to see what groups might have been granted access to the SCC.

    • Have any Service Accounts or users been added? Has runAsUser, seLinuxContext, supplementalGroups or fsGroup been modified?
  • It's important to understand that a pod that can set it's own uid/gid is a privileged pod. This should be very rare but you should check for it in the 'securityContext section'.

  • How can you tell if the requested uid is withing the SCC assigned range? Check annotations on the project running the pod:

oc get project (name) | grep scc

openshift.io/sa.scc.mcs: s0:c5,c0
openshift.io/sa.scc.supplemental-groups: 1000020000/10000
openshift.io/sa.scc.uid-range: 1000020000/10000
  • If running in the restricted SCC, the admin should expect the assigned uid/gid and selinux range and category to match. If it doesn't, this could means it's getting the value from the Docker image itself. This value can be seen by running docker inspect <IMAGE> on the Node running the image. You could check the Dockerfile for the image as well but in some cases it won't be set in the last layer and is inherited from the base image. It's typically the sign of a poorly build image if it has to run with a specific uid/gid. See This page is not included, but the link has been rewritten to point to the nearest parent document.our guidelines for more information.

Setting the Permissions

  • Set all directories to have permissions 755 and the files to have permissions 644
 # cd /cassandra_data/
 # find . -type d -exec chmod 0755 {} \;
 # find . -type f -exec chmod 0644 {} \;
SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.