ODF Noobaa bucket Error: used space exceeded the total capacity - noobaa_bucket_capacity

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat OpenShift Data Foundations (RHODF) v4.14+

Issue

  • Configuring a Noobaa bucket class with the placement attribute set to Spread and then creating an Object Bucket for that claim will result in the pv-pool pods logs showing the following error:

    [Agent/20] [ERROR] CONSOLE:: RPC._on_request: ERROR srv block_store_api.write_block reqid 61425@n2n://652e4f4d83fdd000262749eb(guhh9ar.xtuj) connid *(9h83ykcdgf.9) Error: used space exceeded the total capacity of 107269324800 bytes
    
  • Noobaa endpoint also shows the following errors:

    INVALID_SCHEMA_DB usagereports ERRORS
    INVALID_SCHEMA_DB objectstats ERRORS ... must have required property 's3_usage_info'
    RpcError: used space exceeded the total capacity of 107269324800 bytes
    
  • It's not clear how to correctly configure a Lifecycle Policy for a Noobaa bucket and set a different retention for its objects.

  • It's not clear the difference between these two metrics : NooBaa_bucket_capacity and NooBaa_bucket_used_bytes

Resolution

NOTE1. "Lifecycle bucket configuration in MCG" , is fully supported only since ODF 4.14 , see Supported configurations for Red Hat OpenShift Data Foundation 4.X

NOTE2. Below there is an example that shows how to access to a noobaa bucket and put some data, then configure a lifecycle policy on that bucket , and check how data is deleted from the bucket when data expires.

  • OBC: "obc-on120gb" on Namespace "test"
  • OB: "obc-openshift-storage-obc-on120gb"
  • bucket name: "obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9"
  • Bucket Class: "bc-for-pvc120"
  • Backing Store: "bs-120gb" of type pv-pool -- based on a PVC of 120GB
  1. From "oc get ob obc-openshift-storage-obc-on120gb -o yaml" we get the bucket name:
  name: obc-openshift-storage-obc-on120gb
  resourceVersion: "48479253"
  uid: 39597c56-b5a3-45be-bdd2-b4e9680386dd
spec:
  additionalState:
    account: obc-account.obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9.65b8c48f@noobaa.io
    bucketclass: bc-for-pvc120
    bucketclassgeneration: "1"
  claimRef: {}
  endpoint:
    additionalConfig:
      bucketclass: bc-for-pvc120
    bucketHost: s3.openshift-storage.svc
    bucketName: obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 <<<--- HERE
    bucketPort: 443
    region: ""
    subRegion: ""
  reclaimPolicy: Delete
  storageClassName: openshift-storage.noobaa.io
status:
  phase: Bound
% oc get bucketclass
NAME                          PLACEMENT                                                         NAMESPACEPOLICY   QUOTA   PHASE   AGE
bc-for-pvc120                 {"tiers":[{"backingStores":["bs-120gb"],"placement":"Spread"}]}                             Ready   40s
% oc get backingstore
NAME                           TYPE            PHASE   AGE
bs-120gb                       pv-pool         Ready   38d
  1. Then copy a file of 498MB to the bucket , we want to see how and when this will be delete by the Lifecycle policy:
% date; s3api-120 cp file498mb.tgz s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9
Tue Jan 30 11:12:07 CET 2024

Note1. At this moment, PVC of the BackingStore reports: Used 492.8 MiB , probably comes from "df -h" on "pv-pool" pod "bs-120gb-noobaa-pod-56bfe744"

/dev/vde 118G 493M 118G 1% /noobaa_storage

Note2. GUI reports: Capacity breakdown Multicloud Object Gateway
Projects 497.7 MiB used
Savings 22.07 MiB (4.4%)

Note3. GUI reports: Observe -- Metrics:

  • NooBaa_bucket_class_capacity_usage reports Value in bytes: 521863053 at 11:15:27 == amount of data in all the buckets of a given bucket class (in our case there is only one bucket, so this value matches with the next one)
  • NooBaa_bucket_used_bytes reports Value in bytes: 521863053 at 11:15:24 == amount of data in the bucket
  • NooBaa_bucket_capacity reports an integer value between 0 and 100 : 0 == provides bucket capacity usage in % == ("amount of data in the bucket" / "size of the bucket") *100
  1. Then create a bucket-lifecycle file "lifecycle.json" with this content:
{
    "Rules": [
        {
            "Expiration": {
                "Days": 1
            },
            "ID": "data-expire-withoutprefix",
            "Filter": {
                "Prefix": ""
            },
            "Status": "Enabled"
        }
    ]
}
  • create a new alias:

    alias s3api-120='AWS_ACCESS_KEY_ID=xxxaaaabbbb AWS_SECRET_ACCESS_KEY=xxxxccccdddd aws s3api --no-verify-ssl --endpoint-url https://s3-openshift-storage.apps.myclustername'
    

    NOTE. Above keys can be extracted from the secret with the same name as the OBC

      % oc get secret -n test
      NAME                       TYPE                      DATA   AGE
      obc-on120gb                Opaque                    2      38d
    
      # oc extract secret/obc-on120gb   <<--- this create two files with the keys
      AWS_ACCESS_KEY_ID
      AWS_SECRET_ACCESS_KEY
    
  • at around 12:12 CET we apply lifecycle to the bucket

    % s3api-120 put-bucket-lifecycle-configuration --bucket obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 --lifecycle-configuration file://lifecycle.json
    
  • check that this was applied:

    % s3api-120 get-bucket-lifecycle-configuration --bucket obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9                                             
    {
        "Rules": [
            {
                "Expiration": {
                    "Days": 1
                },
                "ID": "data-expire-withoutprefix",
                "Filter": {
                    "Prefix": ""
                },
                "Status": "Enabled"
            }
        ]
    }
    
  1. The data was deleted 30 hours later, but not before
  % date; s3api-120 ls s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 
  Wed Jan 31 14:34:57 CET 2024

  2024-01-30 11:13:27  521863053 file498mb.tgz
  • at "Wed Jan 31 17:32:40 CET" we found no file:

    % date; s3api-120 ls s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9
    Wed Jan 31 17:32:40 CET 2024
    
  • on the noobaa-core logs we found the moment when data was deleted /reclaimed :

    32mJan-31 16:25:15.04235m [BGWorkers/34] 36m   [L0]39m core.server.bg_services.agent_blocks_reclaimer:: AGENT_BLOCKS_RECLAIMER: BEGIN
    32mJan-31 16:25:15.31635m [BGWorkers/34] 36m   [L0]39m core.server.bg_services.objects_reclaimer:: object_reclaimer: starting batch work on objects:  file498mb.tgz
    32mJan-31 16:25:15.35035m [BGWorkers/34] 36m   [L0]39m core.server.object_services.md_store:: find_object_parts_unreferenced_chunk_ids: chunk_ids 33m16539m referenced_chunks_ids 33m039m unreferenced_chunks_ids 33m16539m
    32mJan-31 16:25:15.39435m [BGWorkers/34] 36m   [L0]39m core.server.object_services.map_deleter:: delete_blocks_from_node: node 65b8c0a676cf740026c5c0da n2n://65b8c0a676cf740026c5c0db block_ids 33m16539m
    32mJan-31 16:25:15.71135m [BGWorkers/34] 36m   [L0]39m core.server.object_services.map_deleter:: delete_blocks_from_node: node 65b8c0a676cf740026c5c0da n2n://65b8c0a676cf740026c5c0db succeeded_block_ids 33m16539m
    32mJan-31 16:25:15.72635m [BGWorkers/34] 36m   [L0]39m core.server.object_services.md_store:: update_object_by_id: 65b8cb7919f07e000d4c4f63 { 32m'$set'39m: { reclaimed: 35m2024-01-31T16:25:15.725Z39m } }
    32mJan-31 16:25:15.83435m [BGWorkers/34] 36m   [L0]39m core.server.bg_services.objects_reclaimer:: no objects in "unreclaimed" state. nothing to do
    

Root Cause

Diagnostic Steps

  • You can check if your lifecycle was configured properly by looking directly into noobaa DB :

      % oc rsh noobaa-db-pg-0
    Defaulted container "db" out of: db, initialize-database (init)
    sh-5.1$ psql -d nbcore
    psql (15.6)
    Type "help" for help.
    
      nbcore=# SELECT data->>'_id', data->>'name', data->>'lifecycle_configuration_rules' FROM buckets WHERE data ? 'lifecycle_configuration_rules';
    
             ?column?         |                     ?column?                     |                                                                  ?column?                                                                   
    --------------------------+--------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------
     65b8c48f76cf740026c5c0e1 | obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 | [{"id": "data-expire-withoutprefix", "filter": {"prefix": ""}, "status": "Enabled", "last_sync": 1712712972497, "expiration": {"days": 1}}]
    (1 row)
    
      nbcore=# 
    
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.