A volume group managed by an LVM-activate resource is active on multiple nodes in a Pacemaker cluster

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 8, 9 (with the High Availability Add-on)

Issue

  • A volume group managed by an LVM-activate resource is active on node1 even though it has node2's system ID attached to it. It should not be allowed to activate on node1.
  • A cluster-managed volume group gets activated at boot time by the lvm2-pvscan service.
  • After Pacemaker started on a node that had just been fenced, the scheduler logged a message like the following for an LVM-activate resource with vg_access_mode=system_id:
Mar 18 14:56:13 node1 pacemaker-schedulerd[4757]: error: Resource halvm is active on 2 nodes (attempting recovery)

Resolution

Add all local (i.e., non-cluster-managed) volume groups to the auto_activation_volume_list parameter in the activation section of /etc/lvm/lvm.conf. For example, if there are two local volume groups named rhel and local_vg, and there are two cluster-managed volume groups named clus_vg1 and clus_vg2, then the auto_activation_volume_list parameter should look like this:

    auto_activation_volume_list = [ "rhel", "local_vg" ]

If there are no non-cluster-managed volume groups, then auto_activation_volume_list should be explicitly set to an empty list, as shown below.

    auto_activation_volume_list = [ ]

Then rebuild the initramfs. It is recommended that the cluster nodes are rebooted after rebuilding the initramfs to verify that only local volumes are activated.

For more information then see: Configuring and managing logical volumes Red Hat Enterprise Linux 8 | 14.1. Controlling autoactivation of logical volumes.

Root Cause

When a RHEL system boots, the lvm2-pvscan systemd service runs pvscan --cache -aay. This automatically activates volume groups. From the pvscan(8) man page:

When  the  --cache and -aay options are used, pvscan records which PVs are available on the system, and activates LVs in completed VGs.  A VG is complete when pvscan sees that the final PV in the VG has appeared.  This is used by event-based system startup (systemd, udev) to activate LVs.
...
       pvscan --cache

       This first clears all existing PV online records, then scans all devices on the system, adding PV online records for any PVs that are found.

       pvscan --cache -aay

       This begins by performing the same steps as pvscan --cache.  Afterward, it activates LVs in any complete VGs.
...
       Auto-activation of VGs or LVs can be enabled/disabled using:
       lvm.conf(5) activation/auto_activation_volume_list

       For more information, see:
       lvmconfig --withcomments activation/auto_activation_volume_list

       To disable auto-activation, explicitly set this list to an empty list, i.e. auto_activation_volume_list = [ ].

       When this setting is undefined (e.g. commented), then all LVs are auto-activated.

An LVM-activate resource, when configured with the option vg_access_mode=system_id, manages a shared volume group or logical volume in an active/passive manner by setting the volume's LVM systemid to the systemid of the node where that volume should be active.

When a node that's running an LVM-activate resource reboots, it runs pvscan --cache -aay as described above. If no other node has recovered the resource and assigned its own systemid to the volume, then the volume still belongs to the rebooted node, and the rebooted node activates it.

This is logically incorrect behavior, since only Pacemaker should activate or deactivate the volume. However, it doesn't cause a problem in and of itself.

The problem arises if another node (e.g., node2) starts the LVM-activate resource that manages the volume, after the rebooted node (e.g., node1) has already activated the volume. At that point, the volume is active on both nodes but has node2's systemid attached to it. This places the volume in a position such that its data is vulnerable to corruption.

The issue has been observed when fencing takes a long time to return a successful result. In this scenario, node2 fenced node1, which was running the LVM-activate resource. node1 rebooted and activated the managed volume during boot-up, while node1's systemid was still attached to the volume. A few seconds later, the fence action was declared a success, and node2 was allowed to recover the LVM-activate resource. node2 then started the resource and assigned its own systemid to the volume. At that point, the volume was active on both nodes with node2's systemid.

The solution is to configure the auto_activation_volume_list parameter in /etc/lvm/lvm.conf, and to exclude from the list all volumes that are managed by LVM-activate resources with vg_access_mode=system_id. That way, only Pacemaker activates the volumes; they cannot be activated automatically at boot or by LVM commands with the -aay options.

Diagnostic Steps

  1. Run lvs on all cluster nodes and check whether the cluster-managed active/passive volume is active on multiple nodes.

  2. Check the auto_activation_volume_list parameter in the activation section of /etc/lvm/lvm.conf and confirm that either the parameter is not configured or it contains the cluster-managed volume.

  3. Check /var/log/messages and determine that the volume was activated automatically during boot.

     Mar 18 18:34:49 node1 lvm[1678]:  pvscan[1678] PV /dev/sdb1 online, VG KBP is complete.
     Mar 18 18:34:49 node1 lvm[1678]:  pvscan[1678] VG KBP run autoactivation.
     ...
     Mar 18 18:34:50 node1 lvm[1678]:  1 logical volume(s) in volume group "KBP" now active
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.