Standalone NooBaa Multicloud Object Gateway (MCG) Performance Tuning Guide (without StorageCluster CR) - Quay

Solution Verified - Updated

Environment

Red Hat Quay (RHQ) v3.x
Red Hat OpenShift Container Platform (RCOCP) v4.x
Red Hat OpenShift Data Foundation (RHODF) v4.x

Issue

NOTE:

This is a legacy deployment that was configured using section 3.2.2.1.1. Create A Standalone Object Gateway of the Quay documentation.

  This configuration is no longer supported, and will fail the ODF v4.19 upgrade (won't transition to High Availability (HA) NooBaa DB). Migrate to a storagecluster managed MCG using the Migrating from a Quay Standalone Multicloud Object Gateway (MCG) to an ODF StorageCluster ODF MCG - NooBaa solution. and follow the Performance tuning guide for Multicloud Object Gateway solution to tune MCG (NooBaa) properly.

Issue:

The Performance tuning guide for Multicloud Object Gateway assists in tuning NooBaa's Multicloud Object Gateway resources specific to users needs.

In the event, there is a standalone MCG deployment using NooBaa deployed IAW with e.g. Quay Product Documentation that may not include the storagecluster resource which is a workflow requirement to increase NooBaa resources IAW Performance tuning guide for Multicloud Object Gateway this solution will assist.

Without the availability of and/or instead of patching/editing the storagecluster resource, the NooBaa CR can be edited/patched and will be outlined in this solution.

Resolution

Considerations

  • Large files
    In that case, the metadata to the data ratio is low. Increasing the resources for the endpoints (Memory and CPU) and the number of endpoints would be the first thing to do. In the case of Namespace buckets, only mem would be sufficient. The CPU is important mainly for data buckets, where the endpoints use the CPU for encryption and deduplication.

  • Small objects
    In that case, the metadata to data ratio is high. For data buckets, this means a high involvement of core and DB. Increasing those pods resources would be the first step. Endpoints memory would probably not be pressured if the core and DB respond quickly. If they don't respond quickly enough, then a back pressure will be built and endpoints will eventually be under pressure. as well. In this case, we would want to increase both core and DB, with more emphasis on the DB itself.

  • High amount of configuration entities such as a large number of buckets, and accounts
    This would also point to the DB and core, with more emphasis on the core.
    When using namespace buckets, increasing the endpoint's memory and the DB's memory and CPU would be the first step.

    As mentioned above, the main variables that would impact the performance of Multicloud Object Gateway (MCG), ordered by impact:

  1. MCG database resources - You need to increase CPU and memory per the workload characteristics.
  2. MCG auto-scale min/max size - That would improve the response to peaks, but it also has a delay till it kicks in, hence it's important to set both minimal and maximum size.
  3. MCG Core resources - You need to increase CPU and memory per the workload characteristics.
  4. Make sure you connect to the NooBaa endpoint using its service address "https://s3.openshift-storage.svc" or "http://s3.openshift-storage.svc" since this connects directly to the NooBaa endpoints
  • You can adjust the auto-scaling with a command like this:
oc patch -n openshift-storage noobaa/noobaa \
    --type merge \
    --patch '{"spec": {"endpoints": {"minCount": 3,"maxCount": 10}}}'

This would set the NooBaa Endpoint Horizontal Pod Autoscaling to deploy at least 3 Pods and scale up to 10 Pods when needed. The default is to deploy at least 1 pod with the maximum being able to scale to 2 Pods.

  • Tunning MCG core and database resources can be done via the NooBaa CR.
apiVersion: noobaa.io/v1alpha1
kind: NooBaa
metadata:
  creationTimestamp: "2023-12-22T14:59:39Z"
  finalizers:
  - noobaa.io/graceful_finalizer
  generation: 3
  labels:
    app: noobaa
  name: noobaa
  namespace: openshift-storage
  resourceVersion: "1157797"
  uid: <omitted>
spec:
  cleanupPolicy: {}
  coreResources:
    limits:
      cpu: "3"                    <-----
      memory: 4Gi          <-----
    requests:
      cpu: "3"                    <-----
      memory: 4Gi          <-----
  dbResources:
    limits:
      cpu: "3"                    <-----
      memory: 4Gi          <-----
    requests:
      cpu: "3"                    <-----
      memory: 4Gi          <-----
  dbType: postgres
  endpoints:
    resources:
      limits:
        cpu: "3"                    <-----
        memory: 4Gi          <-----
      requests:
        cpu: "3"                    <-----
        memory: 4Gi          <-----
  loadBalancerSourceSubnets: {}
  security:
    kms: {}

You can apply the above values by executing this command:

oc patch -n openshift-storage noobaa/noobaa \
    --type merge \
    --patch '{"spec": {"coreResources": {"limits": {"cpu": "3","memory": "4Gi"},"requests": {"cpu": "3","memory": "4Gi"}},"dbResources": {"limits": {"cpu": "3","memory": "4Gi"},"requests": {"cpu": "3","memory": "4Gi"}},"endpoints": {"resources": {"limits": {"cpu": "3","memory": "4Gi"},"requests": {"cpu": "3","memory": "4Gi"}}}}}'
  • When using PV pool backing store, you may not have the expected performance due to low default values. To change it, open the OpenShift console -> OpenShift Data Foundation -> Backing Store -> Select the relevant backing store and click on YAML.

    Look for spec->pvPool and update the requests with CPU and memory. Add a new property of limits and add cpu and memory as well. Example:

spec:
  pvPool:
    numVolumes: 3 <---- can only change to a max of 20
    resources:
      limits:
        cpu: "1"                     <---- 
        memory: 4Gi          <---- 
      requests:
        cpu: "1"                    <---- 
        memory: 4Gi         <---- 
        storage: 50Gi <---- cannot change, set upon creation
    secret: {}
    storageClass: gp3-csi
  type: pv-pool

You can apply the above values by executing this command:

oc patch -n openshift-storage backingstore <backingstore-name <--CHANGE-ME> \
    --type merge \
    --patch '{"spec": {"pvPool": {"resources": {"limits": {"cpu": "1","memory": "4Gi"},"requests": {"cpu": "1","memory": "4Gi"}}}}}'

Diagnostic Steps

Below means noobaa-core and noobaa-db need CPU resources Increase:

117839 d6f   02/01 19:24:22 181694 retryable error [44038], retry = 0, sleep for 1 second(s)
117839 b12   02/01 19:24:23 ###### CURL error, CURLcode= 28, Timeout was reached
<omitted>CCloudFSwithPaging::ListDirectoryInternalPageWise(712) - GetFileList failed for[quay-enterprise-<omitted>2/01 19:06:00 181694 504 Gateway Timeout. <--------

Below indicates noobaa-endpoint HPA needs to be scaled (more pods) to prevent OOMKills:

apiVersion: v1
kind: Pod
<ommited-for-space>
  labels:
    app: noobaa
    noobaa-s3: noobaa
<ommited-for-space>
    name: noobaa-endpoint-6c854fffd6
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-02-07T20:33:54Z"
    status: "True"
    type: Initialized
<omitted-for-space>
  containerStatuses:
  - containerID: cri-o://7ec25d9075ccd3c4a17db08ca82d6d1e8847b340c723ffdbb2cf3f776d593729
    image: registry.redhat.io/ocs4/mcg-core-rhel8@sha256:3083c2de68baa0fabf4a0e92742762af756e22206af4647331e0889ac1d6c90c
    imageID: registry.redhat.io/ocs4/mcg-core-rhel8@sha256:3083c2de68baa0fabf4a0e92742762af756e22206af4647331e0889ac1d6c90c
    lastState: {}
    name: endpoint
<omitted-for-space>
        exitCode: 137 <----------------------------- OOMKill
        reason: Error
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.