ODF Noobaa bucket Error: used space exceeded the total capacity - noobaa_bucket_capacity
Environment
Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat OpenShift Data Foundations (RHODF) v4.14+
Issue
-
Configuring a Noobaa bucket class with the
placementattribute set toSpreadand then creating an Object Bucket for that claim will result in thepv-poolpods logs showing the following error:[Agent/20] [ERROR] CONSOLE:: RPC._on_request: ERROR srv block_store_api.write_block reqid 61425@n2n://652e4f4d83fdd000262749eb(guhh9ar.xtuj) connid *(9h83ykcdgf.9) Error: used space exceeded the total capacity of 107269324800 bytes -
Noobaa endpoint also shows the following errors:
INVALID_SCHEMA_DB usagereports ERRORS INVALID_SCHEMA_DB objectstats ERRORS ... must have required property 's3_usage_info' RpcError: used space exceeded the total capacity of 107269324800 bytes -
It's not clear how to correctly configure a Lifecycle Policy for a Noobaa bucket and set a different retention for its objects.
-
It's not clear the difference between these two metrics :
NooBaa_bucket_capacityandNooBaa_bucket_used_bytes
Resolution
NOTE1. "Lifecycle bucket configuration in MCG" , is fully supported only since ODF 4.14 , see Supported configurations for Red Hat OpenShift Data Foundation 4.X
- There is no resolution found for the errors mentioned above, but configuring a Lifecycle policy should help to free up space on the noobaa bucket , see: Configure Lifecycle Policies on Noobaa/RGW buckets
NOTE2. Below there is an example that shows how to access to a noobaa bucket and put some data, then configure a lifecycle policy on that bucket , and check how data is deleted from the bucket when data expires.
- OBC: "obc-on120gb" on Namespace "test"
- OB: "obc-openshift-storage-obc-on120gb"
- bucket name: "obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9"
- Bucket Class: "bc-for-pvc120"
- Backing Store: "bs-120gb" of type pv-pool -- based on a PVC of 120GB
- From "oc get ob obc-openshift-storage-obc-on120gb -o yaml" we get the bucket name:
name: obc-openshift-storage-obc-on120gb
resourceVersion: "48479253"
uid: 39597c56-b5a3-45be-bdd2-b4e9680386dd
spec:
additionalState:
account: obc-account.obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9.65b8c48f@noobaa.io
bucketclass: bc-for-pvc120
bucketclassgeneration: "1"
claimRef: {}
endpoint:
additionalConfig:
bucketclass: bc-for-pvc120
bucketHost: s3.openshift-storage.svc
bucketName: obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 <<<--- HERE
bucketPort: 443
region: ""
subRegion: ""
reclaimPolicy: Delete
storageClassName: openshift-storage.noobaa.io
status:
phase: Bound
% oc get bucketclass
NAME PLACEMENT NAMESPACEPOLICY QUOTA PHASE AGE
bc-for-pvc120 {"tiers":[{"backingStores":["bs-120gb"],"placement":"Spread"}]} Ready 40s
% oc get backingstore
NAME TYPE PHASE AGE
bs-120gb pv-pool Ready 38d
- Then copy a file of 498MB to the bucket , we want to see how and when this will be delete by the Lifecycle policy:
% date; s3api-120 cp file498mb.tgz s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9
Tue Jan 30 11:12:07 CET 2024
Note1. At this moment, PVC of the BackingStore reports: Used 492.8 MiB , probably comes from "df -h" on "pv-pool" pod "bs-120gb-noobaa-pod-56bfe744"
/dev/vde 118G 493M 118G 1% /noobaa_storage
Note2. GUI reports: Capacity breakdown Multicloud Object Gateway
Projects 497.7 MiB used
Savings 22.07 MiB (4.4%)
Note3. GUI reports: Observe -- Metrics:
NooBaa_bucket_class_capacity_usagereports Value in bytes: 521863053 at 11:15:27 == amount of data in all the buckets of a given bucket class (in our case there is only one bucket, so this value matches with the next one)NooBaa_bucket_used_bytesreports Value in bytes: 521863053 at 11:15:24 == amount of data in the bucketNooBaa_bucket_capacityreports an integer value between 0 and 100 : 0 == provides bucket capacity usage in % == ("amount of data in the bucket" / "size of the bucket") *100
- Then create a bucket-lifecycle file "lifecycle.json" with this content:
{
"Rules": [
{
"Expiration": {
"Days": 1
},
"ID": "data-expire-withoutprefix",
"Filter": {
"Prefix": ""
},
"Status": "Enabled"
}
]
}
-
create a new alias:
alias s3api-120='AWS_ACCESS_KEY_ID=xxxaaaabbbb AWS_SECRET_ACCESS_KEY=xxxxccccdddd aws s3api --no-verify-ssl --endpoint-url https://s3-openshift-storage.apps.myclustername'NOTE. Above keys can be extracted from the secret with the same name as the OBC
% oc get secret -n test NAME TYPE DATA AGE obc-on120gb Opaque 2 38d # oc extract secret/obc-on120gb <<--- this create two files with the keys AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY -
at around 12:12 CET we apply lifecycle to the bucket
% s3api-120 put-bucket-lifecycle-configuration --bucket obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 --lifecycle-configuration file://lifecycle.json -
check that this was applied:
% s3api-120 get-bucket-lifecycle-configuration --bucket obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 { "Rules": [ { "Expiration": { "Days": 1 }, "ID": "data-expire-withoutprefix", "Filter": { "Prefix": "" }, "Status": "Enabled" } ] }
- The data was deleted 30 hours later, but not before
% date; s3api-120 ls s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9
Wed Jan 31 14:34:57 CET 2024
2024-01-30 11:13:27 521863053 file498mb.tgz
-
at "Wed Jan 31 17:32:40 CET" we found no file:
% date; s3api-120 ls s3://obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 Wed Jan 31 17:32:40 CET 2024 -
on the noobaa-core logs we found the moment when data was deleted /reclaimed :
32mJan-31 16:25:15.04235m [BGWorkers/34] 36m [L0]39m core.server.bg_services.agent_blocks_reclaimer:: AGENT_BLOCKS_RECLAIMER: BEGIN 32mJan-31 16:25:15.31635m [BGWorkers/34] 36m [L0]39m core.server.bg_services.objects_reclaimer:: object_reclaimer: starting batch work on objects: file498mb.tgz 32mJan-31 16:25:15.35035m [BGWorkers/34] 36m [L0]39m core.server.object_services.md_store:: find_object_parts_unreferenced_chunk_ids: chunk_ids 33m16539m referenced_chunks_ids 33m039m unreferenced_chunks_ids 33m16539m 32mJan-31 16:25:15.39435m [BGWorkers/34] 36m [L0]39m core.server.object_services.map_deleter:: delete_blocks_from_node: node 65b8c0a676cf740026c5c0da n2n://65b8c0a676cf740026c5c0db block_ids 33m16539m 32mJan-31 16:25:15.71135m [BGWorkers/34] 36m [L0]39m core.server.object_services.map_deleter:: delete_blocks_from_node: node 65b8c0a676cf740026c5c0da n2n://65b8c0a676cf740026c5c0db succeeded_block_ids 33m16539m 32mJan-31 16:25:15.72635m [BGWorkers/34] 36m [L0]39m core.server.object_services.md_store:: update_object_by_id: 65b8cb7919f07e000d4c4f63 { 32m'$set'39m: { reclaimed: 35m2024-01-31T16:25:15.725Z39m } } 32mJan-31 16:25:15.83435m [BGWorkers/34] 36m [L0]39m core.server.bg_services.objects_reclaimer:: no objects in "unreclaimed" state. nothing to do
Root Cause
- This was analyzed on internal This content is not included.Bug 2251897 - [GSS][ODF 4.13.3] noobaa - ERROR srv block_store_api.write_block - used space exceeded the total capacity of
Diagnostic Steps
-
You can check if your lifecycle was configured properly by looking directly into noobaa DB :
% oc rsh noobaa-db-pg-0 Defaulted container "db" out of: db, initialize-database (init) sh-5.1$ psql -d nbcore psql (15.6) Type "help" for help. nbcore=# SELECT data->>'_id', data->>'name', data->>'lifecycle_configuration_rules' FROM buckets WHERE data ? 'lifecycle_configuration_rules'; ?column? | ?column? | ?column? --------------------------+--------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------- 65b8c48f76cf740026c5c0e1 | obc-on120gb-0f408d04-5980-4c4d-903f-a0c879ed0aa9 | [{"id": "data-expire-withoutprefix", "filter": {"prefix": ""}, "status": "Enabled", "last_sync": 1712712972497, "expiration": {"days": 1}}] (1 row) nbcore=#
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.