How to migrate an Image Registry Operator PV in OpenShift 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Internal Image Registry using PV

Issue

  • Move image registry PV storage location.
  • The image registry PV is partially corrupted.
  • The image registry PV is running out of space and is not possible to expand it.
  • Need to migrate data in image-registry PV in OpenShift 4.
  • How to migrate image-registry data.

Resolution

Follow the steps in the below-numbered order:

  1. Check which PVC is used by the registry:

    $ oc get config.imageregistry/cluster -o jsonpath='{.spec.storage.pvc.claim}' | xargs oc get pvc -n openshift-image-registry
    NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    image-registry-storage   Bound    pvc-ff18ea8f-0bc7-4b57-9374-df52691a52d2   100Gi      RWO            gp2-cust       6h18m
    
  2. Change the retention policy of the actual registry PV to Retain:

    $ oc patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
    persistentvolume/pvc-ff18ea8f-0bc7-4b57-9374-df52691a52d2 patched
    
  3. Create the new PV for the registry with a minimum size of 100GB.
    If the image registry is running with multiple replicas the PV has to be created with Access Mode ReadWriteMany (RWX) . For more info refer to the documentation for persistent volumes.
    New PVC example:

    cat <<EOF > new_reg_pv.yaml
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
          name: test-registry
          namespace: openshift-image-registry
    spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 100Gi 
          storageClassName: gp2-cust
    EOF
    

    If using ODF StorageClass, and facing SELinux relabeling issues, refer to workaround to skip SELinux relabeling issues in Openshift Data Foundation before creating the PVC for using a StorageClass with a workaround for the SELinux relabeling issues.

    $ oc apply -f new_reg_pv.yaml -n openshift-image-registry
    
  4. Check that the new PVC is correctly bound inside the openshift-image-registry namespace:

    $ oc get pvc <pvc-name> -n openshift-image-registry
    NAMESPACE                  NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    openshift-image-registry   test-registry            Bound    pvc-096d8c11-846b-4a60-9724-8b9a142ea598   100Gi      RWO            gp2-cust       61m
    
  5. Edit the configs.imageregistry to point to the new PVC {"spec":{"storage"} name and set replicas to Zero:

    $ oc patch config.imageregistry cluster --type merge -p '{"spec":{"storage":{"pvc":{"claim":"$NEWPVC"}}}}'
    $ oc patch config.imageregistry cluster --type merge -p '{"spec":{"replicas":0}}'
    
  6. Create a new deployment called sleep with the rhel-tools image:

    $ oc create deployment sleep --image=registry.access.redhat.com/rhel7/rhel-tools -n openshift-image-registry -- tail -f /dev/null
    
  7. Mount both old and new PVCs to the temporary sleep POD:

    $ OLD_PVC=<Old PVC name goes here>
    $ NEW_PVC=<New PVC name goes here>
    $ oc set volume deployment/sleep --add -t pvc --name=old-claim --claim-name=$OLD_PVC --mount-path=/old-claim -n openshift-image-registry
    $ oc set volume deployment/sleep --add -t pvc --name=new-claim --claim-name=$NEW_PVC --mount-path=/new-claim -n openshift-image-registry
    
  8. Wait for the sleep pod to be recreated.

  9. Connect to the sleep POD and copy the data between the two volumes (some errors can be due to the Lost+Found folder and can be ignored):

    $ oc rsh -n openshift-image-registry $SLEEP_POD
    $ rsync -avxHAX --progress /old-claim/* /new-claim
    sending incremental file list
    rsync: failed to set times on "/new-claim/lost+found": Operation not permitted (1)
    docker/
    docker/registry/
    docker/registry/v2/
    docker/registry/v2/blobs/
    docker/registry/v2/blobs/sha256/
    docker/registry/v2/blobs/sha256/12/
    docker/registry/v2/blobs/sha256/12/1264065f6ae851d6a33d7be03ffde100356592e385b9b72f65f91b5d9b944b92/
    docker/registry/v2/blobs/sha256/12/1264065f6ae851d6a33d7be03ffde100356592e385b9b72f65f91b5d9b944b92/data
              4,366 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=41/57)
    .....
    sent 140,041,780 bytes  received 549 bytes  56,016,931.60 bytes/sec
    total size is 140,004,057  speedup is 1.00
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1179) [sender=3.1.2]
    $ exit
    
  10. Check that the folder has the same size:

    $ du -hs /old-claim/ && du -hs /new-claim/
    
  11. Scale the sleep deployment to 0 to release the two PVCs:

    $ oc scale deployment -n openshift-image-registry sleep --replicas 0
    deployment.apps/sleep scaled
    
  12. Set back the config.imageregistry replicas to the original number to pick up the new PVC:

a. If the migration is from RWO to RWO or from RWX to RWX PV then (change the [REQUIRED_REPLICAS] accordingly):

```
$ oc patch config.imageregistry cluster --type merge -p '{"spec":{"replicas":[REQUIRED_REPLICAS]}}'
```

b. If the migration is from RWO to RWX or from RWX to RWO PV, the rolloutStrategy should be also changed:

- In `RWX` is set to `RollingUpdate`: `rolloutStrategy: RollingUpdate`

- In `RWO` is set to `Recreate`: `rolloutStrategy: Recreate`

Then the command would be (change the `[REQUIRED_REPLICAS]` accordingly):


```
$ oc patch config.imageregistry.operator.openshift.io/cluster --type=merge -p '{"spec":{"rolloutStrategy":"Recreate","replicas":[REQUIRED_REPLICAS]}}'
```
  1. Delete the rhel-tool/sleep deployment and if needed the old PV registry:
$ oc delete deployment -n openshift-image-registry sleep
deployment.apps "sleep" deleted
  1. Check that the registry is up and running:
$ oc get po -n openshift-image-registry | grep image-registry
cluster-image-registry-operator-64f5467494-l57bm   1/1     Running     0          46h
image-registry-7f9df5548f-jkncx                    1/1     Running     0          21h
$ oc get co image-registry
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
image-registry   4.7.19    True        False         False      21h
  1. If no anymore needed the old image registry can be safely removed:
$ oc delete pvc -n openshift-image-registry $OLD_PVC
persistentvolumeclaim "image-registry-storage" deleted

Diagnostic Steps

  1. Check the disk usage of the internal Image Registry:

    $ oc rsh -n openshift-image-registry image-registry-XXXXXX-YYYY 
    $ df -h /registry
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/xvdbm       98G   98G   77M 100% /registry
    
  2. If after the regular pruning maintenance the usage is still high, then check if the volume can be expanded.

  3. Check the SC of the registry volume:

    $ oc get config.imageregistry/cluster -o jsonpath='{.spec.storage.pvc.claim}' | xargs oc get pvc -n openshift-image-registry
    NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    test-registry   Bound    pvc-096d8c11-846b-4a60-9724-8b9a142ea598   100Gi      RWO            gp2-cust       8h
    
  4. Check if the Storage Class contemplates a dynamic expansion for the registry:

    $ oc get sc <volume_storage_class> -oyaml | grep "allowVolumeExpansion: true" || echo "You cannot expand your volume"
    You cannot expand your volume
    

    If expanding the volume is not allowed, follow the Resolution section to create a new PVC and move the images.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.