Recovery NooBaa's PostgreSQL upgrade failure in OpenShift Data Foundation 4.15+

Updated

Issues in 4.15+

In ODF 4.15+, Noobaa is always starting with PostgreSQL 15. In case there was a PostgreSQL upgrade failure, the noobaa-db-pg-0 pod start with the old PostgreSQL 12 version.

Identifying the issue

  • The storagecluster CR is stuck in a Progressing state:
$ oc get storagecluster -o yaml | grep phase

    phase: Progressing
  • The noobaa CR indicates that the noobaa-db upgrade has failed:
$ oc get noobaa/noobaa -n openshift-storage -oyaml | grep " postgresUpdatePhase"  -A 7

  postgresUpdatePhase: Failed
  readme: "\n\tERROR: NooBaa operator cannot reconcile this system spec.\n\n\tCheck
    out the system status.phase, status.conditions, and events with:\n\n\t\tkubectl
    -n openshift-storage describe noobaa\n\t\tkubectl -n openshift-storage get noobaa
    -o yaml\n\t\tkubectl -n openshift-storage get events --sort-by=metadata.creationTimestamp\n\n\tIn
    order to retry, edit the system spec and the operator is watching and will be
    notified.\n\n\tNooBaa Core Version:     5.15.15-26423ce\n\tNooBaa Operator Version:
    5.16.11\n"

Upgrading to PostgreSQL 15

Before Upgrading

Manual data migration

The process of manually upgrading to PostgreSQL 15 requires creating a DB dump in PostgreSQL 12 and restoring it into PostgreSQL 15.

Step 1: stop all NooBaa pods except the DB

Scale down the operator deployment, noobaa-endpoint deployment, and noobaa-core sts:

$ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas 0
$ oc -n openshift-storage scale statefulsets.apps noobaa-core --replicas 0

Step 2: Take a PostgreSQL 12 DB dump

NOTE: This will dump the db into your local machine in your current working directory

$ oc -n openshift-storage rsh noobaa-db-pg-0
$ pg_dumpall -U postgres > /var/lib/pgsql/data/dump.sql

Step 3: When the PG dump completes, validate the dump.sql file's integrity to ensure the dump was successful by running the following command, then exit the pod.

tail -n100 /var/lib/pgsql/data/dump.sql
-- PostgreSQL database dump complete <--------------- SHOULD SEEE THIS
--
-- PostgreSQL database cluster dump complete

$ exit

Step 4: The following command can be executed to copy the db dump outside the noobaa-db-pg-0 pod (keep in a safe location). Change "directory-name" to the desired local storage location.

$ oc cp -n openshift-storage noobaa-db-pg-0:/var/lib/pgsql/data/dump.sql /"directory-name"/dump.sql

Step 5: Rename the userdata directory

Before starting Postgresql-15, it is necessary to remove the userdata directory that contains Postgresql-12 data. To avoid recreation of the directory, we need to do it while Postgresql-12 is not running. Add a command to the DB pod to run in a loop instead of running the server.

$ oc -n openshift-storage patch statefulset noobaa-db-pg --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value":["/bin/sh","-c","while true; do sleep 30; done"]}]'

After noobaa-db-pg-0 restarts, rsh into it (delete the pod if it does not restart automatically)

$ oc -n openshift-storage rsh noobaa-db-pg-0

Rename the userdata directory and exit from the pod:

$ mv /var/lib/pgsql/data/userdata/ /var/lib/pgsql/data/userdata-12
$ exit

Step 6: Start PostgreSQL 15 and restore the DB dump

First, get the desired image of PostgreSQL 15 from the noobaa CR:

$ oc -n openshift-storage get noobaa noobaa -o jsonpath='{$.spec.dbImage} '

Edit the statefulset and update the image for all containers (main container, and in some cases there can be an init container. It's ok if there isn't one)

$ oc -n openshift-storage edit sts noobaa-db-pg

Remove the loop command from the DB pod:

$ oc -n openshift-storage patch statefulset noobaa-db-pg --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/command", "value":[]}]'

Wait for the pod to restart. The DB pod should restart with PostgreSQL-15 without the data.

rsh into the DB pod again, and restore the DB dump using psql:
NOTE: The return value of echo $? should be 0 to indicate a successful restore.
NOTE: Make sure the dump.sql is in your current directory on the local machine

$ oc rsh noobaa-db-pg-0
$ psql -U postgres < /$HOME/data/dump.sql
$ echo $?

Step 7: Restart NooBaa pods

Scale up the operator deployment, noobaa-endpoint deployment, and noobaa-core sts:

$ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas 1
$ oc -n openshift-storage scale statefulsets noobaa-core --replicas 1

All noobaa pods should now start successfully and run using PostgreSQL-15.

Step 8: If the restore was successful, annotate the noobaa CR with manual_upgrade_completed=true to indicate that the Postgres upgrade is completed.

$ oc -n openshift-storage annotate noobaa noobaa manual_upgrade_completed=true

NOTE: Occasionally, when all pods are running NooBaa may still be in a Connecting phase and/or may not come to a Ready state. If this is observed after the above has been performed, please follow the steps in section 12.1. Restoring the Multicloud Object Gateway of the product documentation. One final restart of the pods will bring NooBaa back to a Ready phase.

SBR
Category
Article Type