A Ceph cluster shows a status of 'HEALTH_WARN' with the message "pool <pool> has too few pgs"
Environment
-
Red Hat Ceph Storage 1.3.x
-
Red Hat Ceph Storage 2.x
-
Red Hat Ceph Storage 3.x
Issue
- A
ceph -sshows aHEALTH_WARNstate with one the following messages:
pool <pool-name> has too few pgs
pool <pool-name> has many more objects per pg than average (too few pgs?)
Resolution
- There are three possible ways to resolve this.
1. Delete unused pools from the cluster.
-
If the cluster contains a number of pools with relatively high
pg_numthen removing these unused pools can bring the skew less than 'mon_pg_warn_max_object_skew' and the warning will be cleared. -
In an a Ceph cluster that's used to back OpenStack, it is not uncommon for the "metrics" pool to be the offending pool. This is because the "metrics" pool backs telemetry services such as the OpenStacks Gnoocchi service, which makes very many small writes.
-
In the case where "metrics" pool is the offender, it is best to evaluate if the telemetry service is needed. If so, then we recommend to ensure it is tuned properly with appropriate retention periods etc. If it is not needed, then it's recommend to disable the service, then remove the unneeded pool from the Ceph cluster.
2. Increase the pg_num for the affected pool.
- Most likely this is the case, and the pool simply does not have enough placement groups (PGs). Increasing the pg_num and pgp_num for the pool will so that the skew between object count and pgs for the pool and objects per pool for the entire cluster is greater then 'mon_pg_warn_max_object_skew' the warning will go away. Please ensure you have the proper PG counts using This content is not included. Ceph Placement Groups (PGs) per Pool Calculator. To increase PG counts please use article Ceph: How do I increase Placement Group (PG) count in a Ceph Cluster for increasing the pg_num and pgp_num values.
3. Increase the 'mon_pg_warn_max_object_skew' setting.
- In the event where a cluster already has too many pgs or adding pgs cannot be accomplished safely, increasing this value will prevent the message from being displayed.
- This warning exists to notify the admins that their cluster may have an unbalanced distribution, so altering this parameter should be done cautiously as a change in this value impacts how Ceph will warn about future situations of this scenario.
Root Cause
- The warning compares the 'objects per pg in that pool' vs the 'objects per pg in the entire system', and if there is too much of a skew, issues a warning.
Diagnostic Steps
- Checking the ceph cluster health should give us more details
# ceph health detail
- The following command will show how many objects are in the affected pool.
# ceph df
- To see how many placement groups are in the affected pool
# ceph osd dump | grep pool
-
From the above command output, the information is available to see why the warning is being displayed. The warning compares the objects per pg in that pool to the objects per pg in the entire system. If there is too much of a skew, a warning is issued.
-
This is controlled by the setting:
mon_pg_warn_max_object_skew = 10.0
- By default if the ((objects/pg_num) in the affected pool)/(objects/pg_num in the entire system) >= 10.0 the warning will be displayed.
- NOTE: In versions of RHCS 3.x and above, the 'mon_pg_warn_max_object_skew' parameter is handled by the
ceph-mgrdaemon that was introduced in these versions. - As a result starting with RHCS 3.x and above you will need to either inject this parameter with:
# ceph daemon mgr.`hostname -s` config set mon_pg_warn_max_object_skew <value>
- Or you will have to set this in the
ceph.confunder the [mgr] section for persistence, so that it can be loaded upon any restart of theceph-mgrdaemon.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.