NooBaa Troubleshooting Guide Multicloud Object Gateway (MCG) - OpenShift Data Foundation
Environment
- Red Hat OpenShift Container Platform (RHOCP) v4.x
- Red Hat OpenShift Data Foundation (RHODF) v4.x
Issue
Because NooBaa, the Multicloud Object Gateway (MCG) feature is an entire stack, it may be necessary to troubleshoot issues if it is deemed that MCG object storage has been impacted or degraded.
NOTE: The architectural changes made to the NooBaa database in ODF v4.19 should resolve many of the issues observed in ODF v4.18 and below; however, this solution covers the common issues regarding NooBaa troubleshooting. The most common issue usually involves the NooBaa DB and/or its connection to the NooBaa stack.
This solution will address and convey non-destructive/non-aggressive troubleshooting methods to resolve common issues identified with MCG.
Resolution
NOTE: In ODF v4.18 and below, there is only one noobaa-db pod, noobaa-db-pg-0. However, in ODF v4.19+, there are two noobaa-db pods (noobaa-db-pg-cluster-x) that represent primary and secondary databases. Run the following command to get the respective position (primary/secondary):
$ oc get cluster -n openshift-storage
Non-Destructive/Less Agressive and Safe for Execution:
- It is important to implement a backup procedure for the database. However, according to PostgreSQL’s own documentation, it’s not optimal to collect a pgsql database dump (backup) unless you stop IO to the database by scaling the NooBaa services down (core, endpoint, and operator). It’s recommended to accomplish this periodically to minimize complete data loss.
- A common way to diagnose NooBaa issues is with the following:
a. Checking the status with the following commands:
$ oc get noobaa -n openshift-storage noobaa -o yaml
$ oc get backingstore -n openshift-storage <backingstore-name> -o yaml
b. Checking the modeCode using the What are the different 'modeCode' and 'phase' in NooBaa's BackingStore article.
c. Running the following commands while in the openshift-storage namespace with the NooBaa CLI installed by following section 2.2. Accessing the Multicloud Object Gateway from the MCG command-line interface of the Product Documentation.
$ oc project openshift-storage
$ noobaa status
$ noobaa diagnostics analyze resources
-
Assuming there is an issue with NooBaa and the following two commands result in anything other than phase
Ready, a good first step in Troubleshooting would be to follow the steps in section 12.1. Restoring the Multicloud Object Gateway of the product documentation. Once complete, follow those actions up by attempting to create a test Object Bucket Claim (OBC) in the OCP Console UI (Storage -> Object Storage) to validate NooBaa is functional. The OBC should go bound. -
If pod restarts are observed on the NooBaa pods, OOMKills on the
noobaa-endpointpods, and/or you’re experiencing “504 Gateway Timeout” in the application logs NooBaa will likely need to be tuned using the Performance tuning guide for Multicloud Object Gateway (NooBaa) article. -
One of the most common issues we see with the NooBaa database is observing abnormal capacity growth on the PVC, which can be validated further with the How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC solution. If the growth is confirmed to be abnormal/larger than desired, follow these steps in this order.
a. First, validate the db is configured with the correct Collation Locale, it should be C in the “Collate” column:
ODF v4.18 and below:
$ oc rsh -n openshift-storage noobaa-db-pg-0
$ psql -U postgres -c "\l+" nbcore;
$ exit
ODF v4.19+:
$ oc rsh -n openshift-storage svc/noobaa-db-pg-cluster-r
$ psql -U postgres -c "\l+" nbcore;
$ exit
If it is other than C, review the Change the Multi-Cloud Object Gateway Database's Collation Locale to C solution. NOTE: This resolution is more in-depth and requires a backup, recreation, and restore of the database.
b. If (a) is validated to be Collation Locale C, implement the first two solutions in step 1 of the Performing Maintenance on a Growing PostgreSQL Database solution. Observe the db for some time. There should be a dramatic decrease in size. If not, proceed to the additional steps.
- If the ODF Operator version is ODF v4.15, v4.16,v4.17, or v4.18 it is important to validate that you are not affected by a known issue that occured during the ODF v4.14 -> ODF v4.15 upgrade. This issue happens when ODF upgrades, but the PostgreSQL version inside the noobaa-db-pg-0 pod does not. To validate this issue, describe the
noobaa-db-pg-0pod. There should be a postgresql-15 image, NOT a postgresql-12 image.
$ oc describe pod -n openshift-storage noobaa-db-pg-0 | grep -i image
Good Image Output:
Image: registry.redhat.io/rhel9/postgresql-15@sha256:xxxxx
Image ID: registry.redhat.io/rhel9postgresql-15@sha256:xxxxx
Bad Image Output, see Solution for Fix:
Image: registry.redhat.io/rhel8/postgresql-12@sha256:xxxxx
Image ID: registry.redhat.io/rhel8postgresql-12@sha256:xxxxx
If a postgresql-12 image appears in the output, the resolution is covered in the Recovery NooBaa's PostgreSQL upgrade failure in OpenShift Data Foundation 4.15+ solution.
- The last step a user can accomplish in this Troubleshooting Guide if a NooBaa issue is still occuring is to submit the NooBaa logs to the case. Most times, they’re captured in the ODF must-gather, or by using the following:
$ oc logs -n openshift-storage noobaa-<pod-name> > noobaa-<pod-name>.out
However, the PostgreSQL logs are not native to the `noobaa-db-pg-
The Solution below should be "Private" and referenced by Red Hat Support ONLY. The procedures may be aggressive, possibly destructive, and will be used as procedural guidance after receiving acknowledgement from the Engineering team or after the user has accepted the risk.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.