Ceph: Throttling down or up backfill and recovery and rebalance.
When one adds or removes OSD's (disks) in a Ceph Cluster, the CRUSH algorithm will rebalance the cluster by moving placement groups to or from Ceph OSD's to restore balance. The process of migrating placement groups and the objects they contain can reduce the cluster’s operational performance considerably. To maintain operational performance, Ceph performs this migration with backfilling. Ceph backfill operations are set to a lower priority than read and write request. To further reduce the impact to client operations, there are some tuning parameters which can be applied.
Ceph Clusters consisting of rotational disks and a heavy workload will likely want to reduce the impact of backfilling, (backfill and recovery and rebalance) by throttling down tuning parameters. Whereas customers with solid state disks or customers migrating data in a maintenance window may want to throttling up backfilling, (backfill and recovery and rebalance) in order to get the work done sooner.
This KCS covers RHCS 3.x through RHCS 6.x. Before making any changes, ensure you read all the way to bottom before making any changes.
Decreasing Backfilling (Throttling down)
Make these 3 changes to reduce the impact of Backfill activity, (backfill and recovery and rebalance)
(Run these commands twice)
# ceph tell osd.\* injectargs '--osd_max_backfills=1'
# ceph tell osd.\* injectargs '--osd_recovery_max_active=1'
# ceph tell osd.\* injectargs '--osd_recovery_op_priority=1'
Here's what each argument means
osd_max_backfills
The maximum number of backfills allowed to or from a single OSD. Note that this is applied separately for read and write operations.
osd_recovery_max_active
The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.
osd_recovery_op_priority
The priority of recovery operations vs client operations, if not specified by the pool’s recovery_op_priority. The default value prioritizes client ops (see above) over recovery ops. You may adjust the tradeoff of client impact against the time to restore cluster health by lowering this value for increased prioritization of client ops, or by increasing it to favor recovery.
Please be aware that those settings require osd_mclock_override_recovery_settings set to true when using mclock (links to the relevant documentation is at the bottom of this page in the RHCS6.x section)
** All three parameters can be injected at the same time with the following command:
(Run this command twice)
# ceph tell osd.\* injectargs '--osd_max_backfills=1 --osd_recovery_max_active=1 --osd_recovery_op_priority=1'
Trust, but verify RHCS 4.x (example from a system with 12 OSD's):
# ceph tell osd.\* config get osd_max_backfills | awk '{print $NF}' | sort | uniq -c
12 1
# ceph tell osd.\* config get osd_recovery_max_active | awk '{print $NF}' | sort | uniq -c
12 1
# ceph tell osd.\* config get osd_recovery_op_priority | awk '{print $NF}' | sort | uniq -c
12 1
Trust, but verify RHCS 5.x (example from a system with 12 OSD's):
# ceph tell osd.\* config get osd_max_backfills | grep osd_max_backfills | awk '{print $NF}' | sort | uniq -c
12 "1"
# ceph tell osd.\* config get osd_recovery_max_active | grep osd_recovery_max_active | awk '{print $NF}' | sort | uniq -c
12 "1"
# ceph tell osd.\* config get osd_recovery_op_priority | grep osd_recovery_op_priority | awk '{print $NF}' | sort | uniq -c
12 "1"
The above commands will change the running OSD services and reduce the impact backfilling, (backfill and recovery and rebalance)
If one wants to make these setting persistent, follow the steps below
For RHCS 2.x or RHCS 3.x use the editor of your choice and update /etc/ceph/ceph.conf
This will ensure that these settings will persist on restart or node reboot
[osd]
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
For RHCS 4.x and RHCS 5.x, do not modify ceph.conf. Instead update the Cluster’s Centralized Configuration Database.
Use of ceph.conf is deprecated. Content from docs.ceph.com is not included.See Ceph Config Sources:
Update Cluster’s Centralized Configuration Database:
# ceph config set osd osd_max_backfills 1
# ceph config set osd osd_recovery_max_active 1
# ceph config set osd osd_recovery_op_priority 1
Trust, but verify:
# ceph config dump | grep osd
osd advanced osd_max_backfills 1
osd advanced osd_recovery_max_active 1
osd advanced osd_recovery_op_priority 1
Also, consider disabling scrubbing and deep scrubbing. See steps and caveat below.
Increasing Backfilling (Throttling up)
Ceph Cluster with solid state disks or customers migrating data in a maintenance window may want to increase / throttling up backfilling, (backfill and recovery and rebalance) in order to get the work done sooner. There are 3 ways to increase to increase / throttling up backfilling, (backfill and recovery and rebalance). This list is in order by most impactful to least impactful
- Ensure solid state drives are seen as such, not seen as rotational, HDD.
- Disable scrubbing and deep scrubbing, addition by subtraction
- Increase Backfill and Recovery parameters
Steps:
Ensure solid state (SSD/NVMe) drives are seen as such, not see as rotational, HDD: Follow KCS #3937321, SSD device detected as rotational (HDD). If KCS #3937321 applies to your Ceph Cluster, resolving this issue will give the biggest performance increase for backfill and recovery and rebalance activities
Addition by subtraction: By disabling scrubbing and deep scrubbing, more I/O resources will be available to backfilling, (backfill and recovery and rebalance) therefore making them go faster.
Note: Scrubbing and Deep Scrubbing should never be disabled for more than a week.
Disable Scrubbing and Deep Scrubbing:
# ceph osd set nodeep-scrub
nodeep-scrub is set
# ceph osd set noscrub
noscrub is set
Enable Scrubbing and Deep Scrubbing:
# ceph osd unset nodeep-scrub
nodeep-scrub is unset
# ceph osd unset noscrub
noscrub is unset
For additional backfill performance increase, these parameters can be adjusted Some of these parameters are extremely aggressive and should only be done if the Ceph Cluster is not in production or is in a maintenance window.
- Only valid for RHCS 3.x, RHCS 4.x and RHCS 5.x
- NOT recommended for HDD systems unless used for a migration during a maintenance window
- Do NOT update
ceph.confor theCluster’s Centralized Configuration Databaseto make these persistent
Pick only one of these:
# ceph tell osd.\* injectargs --osd_max_backfills=2 --osd_recovery_max_active=8 # 2x Increase
# ceph tell osd.\* injectargs --osd_max_backfills=3 --osd_recovery_max_active=12 # 3x Increase
# ceph tell osd.\* injectargs --osd_max_backfills=4 --osd_recovery_max_active=16 # 4x Increase
# ceph tell osd.\* injectargs --osd_max_backfills=1 --osd_recovery_max_active=0 # Back to Defaults
Trust, but verify:
# ceph tell osd.\* config get osd_recovery_max_active
osd.0: 8
osd.1: 8
{Only 2 of 12 lines shown}
# ceph tell osd.\* config get osd_recovery_max_active | awk '{print $NF}' | sort | uniq -c
12 8
# ceph tell osd.\* config get osd_max_backfills
osd.0: 2
osd.1: 2
{Only 2 of 12 lines shown}
# ceph tell osd.\* config get osd_max_backfills | awk '{print $NF}' | sort | uniq -c
12 2
Update (Aug-2023): In versions starting with RHCS 6.x and Upstream (Quincy and Reef), there is a new feature called mClock. This feature allows a Ceph Administrator to tune there systems for better Backfill and Rebalance and Recovery performance. With mClock a Ceph Administrator may also slow down Backfill and Rebalance and Recovery to reduce the impact on production.
The secondary affect is all the parameters above plus more are openly discussed in the Ceph Documentation and can be adjusted. For RHCS 6.x and later or Quincy and Reef systems, see these Documentation Links:
Red Hat 6.x Administration Guide - Chapter 10 - The mClock OSD scheduler
Content from docs.ceph.com is not included.Upstream Documentation - Quincy - mClock Config Reference
Content from docs.ceph.com is not included.Upstream Documentation - Reef - mClock Config Reference