What are the possible Placement Group states in an RHCS/Ceph cluster?

Solution Verified - Updated 11 Jul 2025

Environment

Red Hat Ceph Storage 1.3.x
Red Hat Ceph Storage 2.x

Issue

What are some of the possible states the Placement Groups in an RHCS/Ceph cluster can be in?

Resolution

When checking a cluster’s status (e.g., running ceph -w or ceph -s), Ceph will report the status of placement groups.
A placement group can have one or more states. The optimum state for placement groups in the PG map is active + clean.
Placement groups which are in active+<other-state> should ideally serve data.
Placement groups that has a status of down would not serve data. Use ceph health detail to map the backing OSDs for such PGs and investigate the OSD states further.
An example of a HEALTHY cluster:

$ ceph -s
  cluster <UUID>
  health HEALTH_OK
  monmap e3: 1 mons at {<MONITOR-NODE-1>=<IP-ADDRESS>:<PORT>,...}
  election epoch 5, quorum 0 hp-m300-4
  osdmap e1850: 18 osds: 18 up, 18 in
  pgmap v1411049: 1312 pgs, 12 pools, 91017 kB data, 52 objects
  591 GB used, 308 GB / 899 GB avail
  1312 active+clean <== (Number of Placement Groups and it's state)

Example for a cluster that is not HEALTHY.

# ceph -s

cluster <UUID>
health HEALTH_WARN
  230 pgs backfill
  65 pgs backfilling
  216 pgs degraded
  7 pgs peering
  35 pgs recovering
  118 pgs recovery_wait
  37 pgs stuck inactive
  472 pgs stuck unclean
  17 pgs undersized
  recovery 1562/194386 objects degraded (0.804%)
  recovery 14899/194386 objects misplaced (7.665%)
 monmap e3: 1 mons at {<MONITOR-NODE-1>=<IP-ADDRESS>:<PORT>,...}
 osdmap e1021: 36 osds: 36 up, 36 in; 266 remapped pgs
  pgmap v12274993: 5056 pgs, 17 pools, 560 GB data, 92756 objects
  1170 GB used, 39031 GB / 40201 GB avail
  1562/194386 objects degraded (0.804%)
  14899/194386 objects misplaced (7.665%)
  4508 active+clean
  210 active+remapped+wait_backfill
  69 active+recovery_wait+degraded
  49 active+recovery_wait+degraded+remapped
  48 active+remapped+backfilling
  37 activating
  23 active+recovering+degraded
  15 active+remapped
  14 active+undersized+degraded+remapped+wait_backfill
  14 activating+remapped
  14 active+degraded+remapped+backfilling
  14 activating+degraded+remapped
  12 active+recovering+degraded+remapped
  9 activating+degraded
  7 peering
  6 active+degraded+remapped+wait_backfill
  3 active+undersized+degraded+remapped+backfilling
  3 active+degraded+remapped
  1 inactive

IMPORTANT:

The network is an important aspect of a distributed system, hence make sure network configurations are unique throughout the cluster network and the interfaces, both cluster_network and public_network.
For example, a unique MTU should be used (either 9000 or 1500) across the network interfaces in the OSD and MON nodes.
The presence of different MTUs in a cluster can bring in unexpected behaviours such as OSD flapping, heartbeat_check failures, OSDs being wrongly marked down, placement groups stuck in peering etc..

The ceph -s example output above shows a few of the various states a Placement Group can be in.
An explanation of each PG state is given below:

Creating

Ceph is still creating the placement group.

Activating

The placement group is peered but not yet active.

Active

Ceph will process requests to the placement group. Active Placement Groups will serve data.

Clean

Ceph replicated all objects in the placement group the correct number of times.
active+clean is the ideal PG state.

Down

A replica with necessary data is down, so the placement group is offline.
A PG with less than min_size replicas will be marked as down. Use ceph health detail` to understand the backing OSD state.

Laggy

An OSD replica is not acknowledging new leases from the primary OSD in a timely manner. I/O is temporarily paused.

Wait
The set of OSDs for this PG has just changed and I/O is temporarily paused until the previous interval’s leases expire.

Replay
The placement group is waiting for clients to replay operations after an OSD crashed.

Splitting
Ceph is splitting the placement group into multiple placement groups. (functional?)

Scrubbing
Ceph is checking the placement group for inconsistencies.

Deep
Ceph is checking the placement group data against stored checksums.

Degraded
Ceph has not replicated some objects in the placement group the correct number of times yet.

Inconsistent
Ceph detects inconsistencies in the one or more replicas of an object in the placement group (e.g. objects are the wrong size, objects are missing from one replica after recovery finished, etc.).

Peering (peering)

The placement group is undergoing the peering process.
A peering process should clear off without much delay, but if it stays and the number of PGs in a peering state does not reduce in number, the peering may be stuck.
To understand why a PG is stuck in peering, query the placement group and check if it is waiting on any other OSDs. To query a PG, use:

# ceph pg <pg.id> query

If the PG is waiting on another OSD for the peering to finish, bringing up that OSD should solve this.

Repair
Ceph is checking the placement group and repairing any inconsistencies it finds (if possible).

Recovering
Ceph is migrating/synchronising objects and their replicas.

Backfill
Ceph is scanning and synchronising the entire contents of a placement group instead of inferring what contents need to be synchronised from the logs of recent operations. Backfill is a special case of recovery.

Wait-backfill
The placement group is waiting in line to start backfill.

Backfill-toofull (backfill_toofull)

A backfill operation is waiting because the destination OSD is over its full ratio.
Placement Groups which are in a backfill_toofull state will have the backing OSDs hitting the osd_backfill_full_ratio (0.85 by default).
Any OSD hitting this threshold will prevent data backfilling from other OSDs to itself.
NOTE: Any PGs hitting osd_backfill_full_ratio will still serve read and writes, and also rebalance. Only the backfill is blocked, to prevent the OSD hitting the full_ratio faster.
To understand the osd_backfill_full_ratio of the OSDs, use:

   # ceph daemon /var/run/ceph/ceph-mon.*.asok config show | grep backfill_full_ratio

*Backfill-unfound (backfill_unfound)

Backfill has stopped due to unfound objects.

Incomplete
Ceph detects that a placement group is missing information about writes that may have occurred, or does not have any healthy copies. If any of the Placement Groups are in this state, try starting any failed OSDs that may contain the needed information or temporarily adjust min_size to allow recovery.

Remapped
The placement group is temporarily mapped to a different set of OSDs from what CRUSH specified.

Undersized
The placement group fewer copies than the configured pool replication level.
When the number of replicas falls below the pool's 'size' configuration the PG state will show something similar to 'active+undersized'.
However, when the number of replicas falls below the pool's 'min_size' configuration, the PG state will reflect 'undersized' ie it will not have 'active+' and will be read-only state.

Peered
The placement group has peered but cannot serve client IO due to not having enough copies to reach the pool’s configured min_size parameter. Recovery may occur in this state, so the pg may heal up to min_size eventually.

IMPORTANT

A placement group can be in any of the above states and doesn't necessarily show a problem because it's not active + clean. It should ultimately reach an active + clean state automatically, but manual intervention may be needed sometime. Placement Groups in active+<some-state-other-than-clean> should serve data, since the PG is still active.

Usually, Ceph tries to fix/repair the Placement Group states and make it active + clean, but the PGs can end up in a stuck state in certain cases. The stuck states include:

Inactive
Placement groups in the Inactive state won't accept any I/O. They are usually waiting for an OSD with the most up-to-date data to come back up. In case the UP set and ACTING set are same, and the OSDs are not blocked on any other OSDs, this can be a problem with peering. Manually marking the primary OSD down will force the peering process to start since Ceph would bring the primary OSD back automatically up. The peering process is kickstarted once an OSD comes up.

Stale
The placement group is in an unknown state - because the OSDs that host them have not reported to the monitor cluster in a while (configured by mon_osd_report timeout).

Unclean
Placement groups contain objects that are not replicated the desired number of times. A very common reason for this would be OSDs that are down or OSDs with a 0 crush weight which prevents the PGs to replicate data onto the OSDs and thus achieve a clean state.

Following are two more new PG states which were added in jewel release for snapshot trimming feature.

snaptrim:
The PGs are currently being trimmed

snaptrim_wait:
The PGs are waiting to be trimmed

To identify stuck placement groups, execute the following:

# ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]

Note:
For more detail explanation of placement group states, please check This content is not included.monitoring_placement_group_states.

SBR

Ceph

Product(s)

Red Hat Ceph Storage

Category

Learn more

Tags

Ceph

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.