A Ceph cluster shows a status of 'HEALTH_WARN' with the message "pool <pool> has too few pgs"

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 1.3.x

  • Red Hat Ceph Storage 2.x

  • Red Hat Ceph Storage 3.x

Issue

  • A ceph -s shows a HEALTH_WARN state with one the following messages:
pool <pool-name> has too few pgs
pool <pool-name> has many more objects per pg than average (too few pgs?)

Resolution

  • There are three possible ways to resolve this.

1. Delete unused pools from the cluster.

  • If the cluster contains a number of pools with relatively high pg_num then removing these unused pools can bring the skew less than 'mon_pg_warn_max_object_skew' and the warning will be cleared.

  • In an a Ceph cluster that's used to back OpenStack, it is not uncommon for the "metrics" pool to be the offending pool. This is because the "metrics" pool backs telemetry services such as the OpenStacks Gnoocchi service, which makes very many small writes.

  • In the case where "metrics" pool is the offender, it is best to evaluate if the telemetry service is needed. If so, then we recommend to ensure it is tuned properly with appropriate retention periods etc. If it is not needed, then it's recommend to disable the service, then remove the unneeded pool from the Ceph cluster.

2. Increase the pg_num for the affected pool.

3. Increase the 'mon_pg_warn_max_object_skew' setting.

  • In the event where a cluster already has too many pgs or adding pgs cannot be accomplished safely, increasing this value will prevent the message from being displayed.
  • This warning exists to notify the admins that their cluster may have an unbalanced distribution, so altering this parameter should be done cautiously as a change in this value impacts how Ceph will warn about future situations of this scenario.

Root Cause

  • The warning compares the 'objects per pg in that pool' vs the 'objects per pg in the entire system', and if there is too much of a skew, issues a warning.

Diagnostic Steps

  • Checking the ceph cluster health should give us more details
# ceph health detail
  • The following command will show how many objects are in the affected pool.
# ceph df
  • To see how many placement groups are in the affected pool
# ceph osd dump | grep pool
  • From the above command output, the information is available to see why the warning is being displayed. The warning compares the objects per pg in that pool to the objects per pg in the entire system. If there is too much of a skew, a warning is issued.

  • This is controlled by the setting:

mon_pg_warn_max_object_skew = 10.0
  • By default if the ((objects/pg_num) in the affected pool)/(objects/pg_num in the entire system) >= 10.0 the warning will be displayed.
  • NOTE: In versions of RHCS 3.x and above, the 'mon_pg_warn_max_object_skew' parameter is handled by the ceph-mgr daemon that was introduced in these versions.
  • As a result starting with RHCS 3.x and above you will need to either inject this parameter with:
# ceph daemon mgr.`hostname -s` config set mon_pg_warn_max_object_skew <value>
  • Or you will have to set this in the ceph.conf under the [mgr] section for persistence, so that it can be loaded upon any restart of the ceph-mgr daemon.
SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.