A Ceph cluster shows a status of 'HEALTH_WARN' warning with the message "too many PGs per OSD", why?

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 1.3.x
  • Red Hat Ceph Storage 2.x

Issue

  • An RHCS/Ceph cluster shows a status of 'HEALTH_WARN' warning with the message "too many PGs per OSD", why?

  • This can normally happen in two cases :

    • A perfectly normal RHCS cluster (usually 1.3.x) may see this after adding new pools, or adding more placement groups to existing pools.

    • A healthy RHCS1.2.3 cluster upgraded to RHCS1.3.x, provided the PG density (PG per OSD) was already high before upgrade.

  • A ceph -s should print the cluster health.

# ceph -s
    cluster cccccccc-eeee-eeee-ffff-7aceb97642a0
     health HEALTH_WARN
            too many PGs per OSD (496 > max 300)

Resolution

  • This message is a warning and doesn't impose any direct threats to the health of the cluster, but is something that should be looked into and fixed. It means that the number of PGs per OSD is high, or in other words, the density of the PGs per OSD is higher than what is configured.

  • Too many Placement Groups can increase the system resource consumption on the cluster node since most of the cluster operations wrt the objects are done on placement groups. Hence it's better to keep the placement group to the suggested numbers.

  • As per the Ceph documentation, the number of Placement Groups depends on the total number of OSDs in the cluster. The suggested values are:

Less than 5 OSDs, set pg_num to 128
Between 5 and 10 OSDs, set pg_num to 512
Between 10 and 50 OSDs, set pg_num to 4096
  • To get the current set value of mon_pg_warn_max_per_osd, run the following command on a Monitor node.
# ceph daemon /var/run/ceph/ceph-mon.*.asok config show | grep mon_pg_warn_max_per_osd
  • There are two ways to overcome this message, one being a temporary one and the other, a more permanent one.

1.The number of Placement Groups are checked against the set value of the tunable mon_pg_warn_max_per_osd. By default, this is set to 300. Increasing this value will prevent the messages for the time being.

  • Append the following tunable and value (a higher value than the current value) in /etc/ceph/ceph.conf and restart the monitors one by one to get the value set.
mon_pg_warn_max_per_osd = 600 # or a higher value than the current one. 

2.Increase the numbers of OSDs in the cluster, so that the Placement Groups can spread across the new OSDs and hence the density decreases.

NOTE: Existing Placement Group count cannot be decreased, but only increased. Hence, in this specific situation, adding OSDs is the viable and a permanent solution so that existing Placement Groups can spread out to the new OSDs and can reduce the number of PGs per OSD.

3.Another method exists to solve this, but may not be practical in all cases. This is to delete unwanted/unused pools from the cluster. This will delete the Placement Groups for that pool and reduce the density of PGs per OSD. Or if a pool exists with a higher number of placement groups, it'd be good to create another pool with lesser or a sane and suggested number of PGs and migrate the data over. Read the article How to copy or migrate a Ceph pool? article for the details on how to copy a pool data to a new pool.

Root Cause

  • The message "too many PGs per OSD" can come up in a normal healthy cluster, if new pools are added (which adds in more Placement Groups) which in turn increases the total number of PGs in the cluster than the configure default value of 300 (set via mon_pg_warn_max_per_osd).

  • A specific case is when a RHCS1.2.3 cluster is upgraded to RHCS1.3.x, and starts to see this. The warning "too many PGs per OSD" is expected to be seen after a Ceph cluster upgrade even if the PGs were on the high side before the upgrade. This is due to the reason that RHCS1.2.3 was not checking the number of PGs per OSD but later versions (RHCS1.3.x) started checking this and reporting it.

  • This is explained in Sage Weil's mail to the ceph-users mailing list. Read it Content from www.spinics.net is not included.here

  • In general, it is mandatory to plan ahead and choose a suitable value of pg_num for the pool(s) beforehand. Using the Placement Group Calculator is the suggested method to find the number of PGs per pool.

  • This content is not included.Red Hat Labs PG Calculator

  • Content from ceph.com is not included.Upstream Ceph PG Calculator

SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.