Ceph Scrubbing And Its Parameters
Ceph OSDs are responsible for storing, retrieving, protecting and checking the coherence of the data stored in the Reliable Autonomic Distributed Object Store. In order to check the coherence of the data stored, the different copies of each object must be periodically checked in order to verify that all copies are identical. The verification process is driven by the primary OSD assigned to the PG and is known as scrubbing.
Scrubbing types
- Light Scrubbing
The light scrubbing consists in checking, for each object within a Placement Group (PG), that each copy stored across the OSDs protecting the PG has the same size and the same digest. - Deep Scrubbing
The deep scrubbing consists in physically reading each object within a Placement Group (PG) on all the OSDs protecting the PG in order to recalculate the digest and compare the freshly recalculated value between all OSDs protecting the PG. Note that the entire PG is physically read during the deep-scrubbing operations ensuring that the data that was once written can be read again (protection again write failures left undetected or corruption) and that the data re-read is identical on all OSDs protecting the PG.
Automated scrubbing
Each PG will automatically light scrubbed. The parameters influencing the light scrubbing process are:
osd_scrub_min_intervalMinimum interval between two light scrubs when load is low. Defaults to 1 day.osd_scrub_max_intervalMaximum interval between two light scrubs irrespective of load. Defaults to a week.osd_scrub_load_thresholdLight scrubbing occurs if system load is below this value. Defaults to 50% (0.5).osd_deep_scrub_intervalInterval between two deep scrubs irrespective of load. Defaults to a week.osd_max_scrubsMaximum number of concurrent PG scrub concurrent operations per OSD. Defaults to 1.osd_scrub_sleepDelay between two light or deep scrub operations. Defaults to 0.0.
It is possible to disable automated light and deep scrubbing operations an prefer a manual scehduling mechanism in order to better control the exact time those particular operations take place.
ceph osd set noscrubDisable automated light scrubbingceph osd set nodeep-scrubDisable automated light scrubbing
These special flags are shown by the ceph -s command in order to remind the cluster administrator of this particular situation.
cluster 8c5d3515-4dab-436d-92b8-267bd4f1185c
health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
monmap e1: 1 mons at {daisy=192.168.122.114:6789/0}, election epoch 1, quorum 0 daisy
osdmap e22: 3 osds: 3 up, 3 in
flags noscrub,nodeep-scrub
pgmap v151: 192 pgs, 3 pools, 0 bytes data, 0 objects
102 MB used, 27512 MB / 27614 MB avail
192 active+clean
Manual scrubbing
It is possible to perform a manual scrub operation at any time. You can manually trigger a scrub operation using the following commands.
ceph pg scrub {pgid}Trigger a one shot light scrubbing operation on the specified PG.ceph pg deep-scrub {pgid}Trigger a one shot deep scrubbing operation on the specified PG.ceph osd scrub {osdid}Trigger a one shot light scrubbing operation of all the PGs managed by the specified OSD.ceph osd deep-scrub {osdid}Trigger a deep scrubbing operation of all PGs managed by the specified OSD.
Extra parameters for controlling scrubbing
osd_disk_threadsNumber of disk threads available at run time. Defaults to 1.osd_disk_thread_ioprio_classCFQ class assigned to each disk thread. Defaults to None ("").osd_disk_thread_ioprio_priorityPriority within the CFQ class assigned to each disk thread. Defaults to None (-1).osd_deep_scrub_strideRead size used during deep scrubbing operations. Defaults to 524288 (512KB).
Appropriate values for osd_disk_thread_ioprio_class are : "be" for Best Effort, "rt" for Real Time and "idle"
Appropriate values for osd_disk_thread_ioprio_priority are : An integer between 0 (highest) and 7 (lowest)
The above two (2) parameters require the OSD devices to use the CFQ disk elevator.
Scheduled scrubbing operations via cron
If you have decided to go for a home scheduling of the scrubbing process, do not hesitate to read the following article for details about its implementation.
Checking a device disk elevator
The following command "cat /sys/block/{OSDDevice}/queue/scheduler" will display on stdout the disk elevator used by a particular device.
$ sudo cat /sys/block/sdb/queue/scheduler
noop [deadline] cfq
The value displayed between [] indicates the disk elevator configured for the device. In the aboce example, deadline.