Is there an ideal value for RGW bucket index sharding in RHCS/Ceph?
Environment
- Red Hat Ceph Storage 2
- Red Hat Ceph Storage 3
- Rados Gateway (RGW)
Issue
- Is there an ideal value for Rados Gateway bucket index sharding in Red Hat Ceph Storage?
Resolution
NOTE: Please refer the documentation guide Ceph Object Gateway for Production in case a new Ceph cluster is being installed for Rados Gateway.
-
Sharding is the process of splitting a data store or a collection of data into multiple confinements. It is a feature of Distributed Systems and can help in increasing performance while accessing large volumes of data.
-
The Ceph Object Gateway, otherwise known as Rados Gateway (RGW) stores the bucket index data in the index pool, which by default is
.rgw.buckets.index. -
In case a large number of objects are stored in a single bucket (if the bucket doesn't have any quota set for the max number of objects per bucket), the lookups can take time and can be a cause for performance degradation.
-
Sharding the RGW bucket index helps prevent performance bottlenecks when allowing a high number of objects per bucket. This helps increase the speed since parallel operations are possible, ie.. one per shard.
-
From RHCS1.3 [Ceph Hammer] onwards, sharding the RGW bucket index is supported.
-
By default bucket sharding is set to
0, ie.. disabled. This can be changed by changing the tunablergw_override_bucket_index_max_shards. -
If enabled, it would apply to new buckets but not on existing buckets.
-
It needs to be changed only when the RGW buckets will store a very high number of objects, perhaps in millions or more. Please read the
Root Causesection for more information. -
The value of bucket sharding depends on the environment, ie.. no two environments may have the same value and hence there is no pre-set sharding values for a specific number of objects.
-
It's recommended to directly reshard the bucket to the expected shard count. For example, if the current shard count is 11 and you want to increase the shard count to 3001, do it on one go.
-
It's always recommended to have the shard count as the closest prime number to the calculated shard count to spread the number of bucket index entries across the bucket index shards more evenly.
-
A side effect of highly increasing the sharding value is the overall slowness of I/O operations on the RGW pools.
-
Performance tests for the normal operations should be done on the RGW buckets after changing the sharding value, to measure the difference between the default and the new value.
How to check if sharding is set
Commands:
radosgw-admin metadata get bucket:<bucket name> | grep bucket_id
radosgw-admin metadata get bucket.instance:<bucket name>:<bucket_id> | grep num_shards
For example:
# radosgw-admin metadata get bucket:mybucket | grep bucket_id
"bucket_id": "default.1970130.1"
# radosgw-admin metadata get bucket.instance:mybucket:default.1970130.1 | grep num_shards
"num_shards": 8,
OR
# radosgw-admin zonegroup get
{
"zones": [
{
"name": "default",
"endpoints": [
"http:\/\/storage.example.com:80\/"
],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 8 <=== enabled with 8 shards
},
- The Rados Gateway process should be restarted after setting a new shard value:
# sudo systemctl restart ceph-radosgw.target
- Read more on Rados Gateway bucket sharding in the Red Hat Ceph Storage Rados Gateway documentation.
Important note: maximum number of shards until RHCS 3 is 7877. RHCS 4 does support 65521 shards
- Use the following formula to calculate the recommended number of shards:
'number of objects expected in a bucket / 100,000' -> this will give you the shard count.
- From Red Hat Ceph Storage 1.3.3 and Red Hat Ceph Storage 2.0 - RGW do support offline bucket reshard If you have huge index shard objects and you want to reshard. Please remember this is offline reshard and it needs that all operations to the bucket are stopped.
Root Cause
-
In Rados Gateway, an index is maintained internally which helps in listing the RGW objects in a bucket.
-
This index is kept in a single RADOS object within the respective index pool for that bucket.
-
For example, all the objects within the
.rgw.pool is indexed within a RADOS object in the.rgw.buckets.indexpool. -
The index contains the name of each object in the bucket and a reference to the latest version of it.
-
Every time an object is created/deleted/updated, the index has to be updated as well.
-
When the number of objects in a bucket increases drastically, the size of the index increases as well. Maintaining the index on a single RADOS object can increase the lookup as well as other operations, which can inturn be a bottleneck.
-
In order to work around this, the index is sharded (split) onto multiple RADOS objects within the index pool. Splitting the index makes parallel operations possible since they are on different RADOS objects.
-
Additionally, RADOS does not handle large individual objects well. In a bucket where there are thousands of objects and still growing, the index size also grows along, which makes it difficult to handle the index object. Hence, it's a good idea to split the index.
-
NOTE: Do not increase the bucket sharding to a large number, this can badly impact the lookup time and hence the object listing. Always test with small increments and understand the difference prior moving to production.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.