ceph df MAX AVAIL is incorrect for simple replicated pool

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 1.3.z
  • Upstream Hammer release
  • Red Hat Ceph Storage 2.y
  • Upstream Jewel release
  • Red Hat Ceph Storage 3.y
  • Upstream Luminous release

Issue

  • ceph df MAX AVAIL is incorrect for simple replicated pool

Resolution

The MAX AVAIL value does not represent the amount of free space. Rather it represents the amount that can be written until the highest used OSD will get full. It is a complicated function of the replication or erasure code used, the CRUSH rule that maps storage to devices, the utilization of those devices, and the configured mon_osd_full_ratio.

Throughout the History ceph used different formulas to calculate this value:

  • min(osd.avail for osd in OSD_up) : Minimum space left in an OSD in up set in pool crush ruleset and same what Sage has suggested in Content from tracker.ceph.com is not included.upstream tracker #13844 that your usage is bounded by osd.X.

  • len(osd.avail for osd in OSD_up) : Number of OSDs in UP set in pool crush ruleset

  • pool.size() : pool replication size

  • Ceph count every OSD that can be taken by the crush map by following the rule used by the pool in question.

  • in RHCS 2.5 the MAX AVAIL value calculation was changed and now the 'mon_osd_full_ratio' is taken into account too

      [min(osd.avail for osd in OSD_up) - ( min(osd.avail for osd in OSD_up).total_size * (1 - mon_osd_full_ratio)) ]* len(osd.avail for osd in OSD_up) /pool.size() 
    

-mon_osd_full_ratio = 0.95

  • in In RHCS 3 the MAX AVAIL value calculation was changed and now the 'full_ratio' is taken into account - default value 0.95

      [min(osd.avail for osd in OSD_up) - ( min(osd.avail for osd in OSD_up).total_size * (1 - full_ratio)) ]* len(osd.avail for osd in OSD_up) /pool.size() 
    
  • what is your current 'full_ratio' value you can get:

# ceph osd dump | grep full_ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
  • and change it with following command:
# ceph osd set-full-ratio 0.9
#  ceph osd dump | grep full_ratio
full_ratio 0.9
backfillfull_ratio 0.9
nearfull_ratio 0.85

Root Cause

Diagnostic Steps

  • Ceph df
GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED
    3195T     2436T         759T         23.77

rbd  57       252T      7.90          345T     66220069
  • Now check the rbd pool ruleset :
pool 57 'rbd' replicated size 3 min_size 2 crush_ruleset 1 object_hash rjenkins pg_num 16384 pgp_num 16384 last_change 205704 flags hashpspool stripe_width 0

# rules
rule mkt_ext_ruleset {
	ruleset 1 <===========================================
	type replicated
	min_size 1
	max_size 7
	step take default
	step chooseleaf firstn 0 type host
	step emit
}

root default {
	id -2		# do not change unnecessarily
	# weight 2096.640
	alg straw
	hash 0	# rjenkins1
	item node_a weight 174.720
	item node_b weight 174.720
	item node_c weight 174.720
	item node_d weight 174.720
	item node_e weight 174.720
	item node_f weight 174.720
	item node_g weight 174.720
	item node_h weight 174.720
	item node_i weight 174.720
	item node_j weight 174.720
	item node_k weight 174.720
	item node_l weight 174.720
}
  • I am not adding all the host_group and host bucket list for this rule here.

  • But I have checked this rule has OSDs starting from OSD.0 to OSD.575 means total : 576 OSDs.

  • Now let us check the minimum space left osd starting from OSD.0 to OSD.575 , checking ceph osd df :

31 3.64000  1.00000 3723G 1881G 1842G 50.53 2.13 <------------- OSD.31 highest variation of 2.13 and space left  *1842G*
  • rbd has pool size 3

  • Now let us apply the formula :

min(osd.avail for osd in OSD_up)* len(osd.avail for osd in OSD_up) / pool.size() 

(1842*576)/3 = 1060992/3 = 353664 then convert in TB  =  353664/1024 = 345.375 TB
  • Which is same as current MAX available.

  • For RHCS 3 and RHCS 2.5

[min(osd.avail for osd in OSD_up) - ( min(osd.avail for osd in OSD_up).total_size * (1 - mon_osd_full_ratio)) ]* len(osd.avail for osd in OSD_up) /pool.size() 

((1842 - (3723 * (1 - 0.95)))*576) /3 = ((1842 - 186) * 576) /3 = 317952 = 310.5 TB
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.