Why OpenShift Container Storage nodes are not discoverable in wizard for Local Storage deployment?

Solution Unverified - Updated

Environment

  • OpenShift Container Storage(OCS) v4.6
  • OpenShift Container Platform(OCP) v4.6

Issue

  • OpenShift Container Storage nodes are not discoverable in wizard for Local Storage deployment.
  • diskmaker-discovery, diskmaker-manager daemonset pod from LocalStorage operator not able to start on Storage nodes.

Resolution

  • Fix for this issue is now available in OCP-4.6.12 or above. In case an older OCP version is being used and the issue is observed. Follow the workaround below.

  • Workaround is to complete the OCS cluster creation by removing the taint and then adding the taint and toleration later on. Follow the steps given below:

    Step 1: Untaint OCS nodes

       # oc adm taint nodes --all node.ocs.openshift.io/storage-
    

    Step 2: Rediscover the nodes and complete the OCS cluster creation
    * Follow the relevant deployment documentation to complete your cluster creation.

    Step 3: Add required toleration

    • Get the localvolumendesicoveries

         # oc get localvolumediscoveries.local.storage.openshift.io -n openshift-local-storage 
         Example Output:
         NAME                    AGE
         auto-discover-devices   175m 
      
    • Edit the localvolumedesicoveries spec to add toleration

               # oc edit localvolumediscoveries.local.storage.openshift.io  auto-discover-devices -n openshift-local-storage
             
              Add/update the toleration under spec section
      
              -----------------------Snippet--------------------
              tolerations:
              - effect: NoSchedule
                key: node.ocs.openshift.io/storage
                operator: Equal
                value: "true"
      
    • Get the localvolume sets

               # oc get localvolumesets.local.storage.openshift.io -n openshift-local-storage
      
              Example output
              NAME         STORAGECLASS   PROVISIONED   AGE
              localblock   localblock     8             175m
      
    • Edit the localvolumesets spec to add toleration

               # oc edit localvolumesets.local.storage.openshift.io localblock -n openshift-local-storage
      
              Add/update the toleration under spec section
      
              -----------------------Snippet--------------------
              tolerations:
              - effect: NoSchedule
                key: node.ocs.openshift.io/storage
                operator: Equal
                value: "true"
      

    Step 4: Add back the taint to all the relevant OCS nodes

       # oc adm taint node <node name> node.ocs.openshift.io/storage="true":NoSchedule
       Note:  All the OCS nodes should be tainted.
    

Root Cause

  • This is a known issue as highlighted in the known issues for OCS-4.6.
  • Red Hat OpenShift Container Storage v4.6 for Local Storage based deployments can now be deployed using the user interface on Openshift Container Platform v4.6. During the storage cluster creation, nodes are not discoverable if Red Hat OpenShift Container Storage nodes have the taint node.ocs.openshift.io/storage="true":NoSchedule because localvolumeset and localvolumediscovery custom resources do not have the required toleration.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.