LVM command hangs or takes a long time to complete in RHEL

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 4, 5, 6

Issue

  • LVM commands hang or take a long time to complete.
  • System boot hangs at an LVM command.
  • LVM commands get stuck when scanning a multipath map with all paths failed and queue_if_no_path or no_path_retry enabled.
  • LVM commands get stuck when scanning a device-mapper map that is in the suspended state.
  • LVM commands get stuck when scanning a device which is not responsive or functional.
  • Server is hanging after running commands pvs, vgs, lvs.

Resolution

The specific resolution to these issues depends on the underlying cause, which can be a result of a number of different conditions. See the below Root Cause and Diagnostic Steps to help narrow down the cause of the hang before proceeding to making any changes.

For resolving performance related issues during device scans and lvm2 commands, updating to the latest available version of the lvm2 package is recommended. In RHEL 5, performance of scans and other lvm commands has been significantly improved in releases lvm2-2.02.84-6.el5 (RHEL 5.7) and lvm2-2.02.88-7.el5 (RHEL 5.8).

The following tips may help prevent LVM commands from hanging:

  • Modify the default 'filter' line in /etc/lvm/lvm.conf to avoid scanning any devices other than those on which there are physical volumes.
    • For example, if the physical volumes are on device-mapper-multipath devices and cciss devices, the following line could be used:
   filter = [ "a|/dev/cciss/.*|", "a|/dev/mapper/mpath.*|", "r|.*|" ]

Root Cause

When an LVM command is run, it issues IO to many devices on the system, subject to the 'filter' rule in /etc/lvm/lvm.conf. By default, all devices are included, which means LVM will issue IO to all block devices on the system to determine the LVM configuration on the machine.

There are a few very common scenarios which lead to LVM commands hanging:

  1. LVM is issuing I/O to devices that do not contain a physical volume label, and the LVM process hangs waiting for the I/O to complete
  2. One or more block devices that were previously available for I/O are now unavailable. The root cause may be path failures to the storage, complete LUN removal at the storage side, or other storage-related failures.
  3. A previous LVM command is hung or crashed holding a lock, and subsequent LVM commands hang waiting for the lock.
  4. An LVM command is blocked on a specific device mapper ioctl in the kernel, such as DM_TABLE_LOAD. This failure may be seen when doing more complex operations, such as "vgchange" or "lvchange", which must do kernel operations in addition to IO to block devices.
  5. LVM by default stores a copy of metadata on every physical volume. In systems with large numbers of physical volumes, this can cause slowdowns in LVM commands as each command run must read all physical volumes to determine the full LVM configuration. If the system is one with a large number of physical volumes, and this is the underlying cause, reducing the number of physical volumes containing metadata may dramatically improve performance and remove the appearance of hangs. See This content is not included.https://access.redhat.com/kb/docs/DOC-62651 and This content is not included.https://access.redhat.com/kb/docs/DOC-5542 for further information.
  6. LVM may be scanning a device-mapper map (a multipath device, logical volume, dmraid, snapshot, etc) and that device is in a suspended state. Follow the below diagnostic steps to determine if a specific device is causing the hang. If one is, and it is a device-mapper device, then check the output of dmsetup info -c to determine if there is an 's' in the attributes column (indicating its suspended):
# dmsetup info -c
mpath28             253   3 L-sw    1    1      1 mpath-0QEMU_QEMU_HARDDISK_drive01                                   

Diagnostic Steps

  • Run the LVM command that hangs with the very verbose flags (-vvvv). For example:
# vgscan -vvvv

This often gives an indication of what device is being read at the time of the hang.

  • Check /etc/lvm/lvm.conf 'filter' line.  The filter line should be set to only scan devices which contain LVM metadata.    
    • If set to default (filter = [ "a/.*/" ]) LVM will scan all devices which may lead to hung I/O and LVM processes.

    • Check the filter line with the following command:

                 grep -v \# /etc/lvm/lvm.conf | grep filter
      
  • Check /etc/lvm/cache/.cache file.  If it contains many devices which do not have PVs on them, LVM is scanning unncessarily.  A more restrictive LVM filter should fix the problem.
  • Determine if the lvm command is truly hung by using 'ps' or 'top' to see if it is blocked or still in running state, and by looking in /var/lock/lvm (directory used by LVM to store volume group lock files).   
    • If the command is not running, but blocked, use strace or attach gdb to determine precisely where it is hung, (for example, a read() of a block device, or a DM_TABLE_LOAD ioctl).
    • If another LVM process is hung stopped, other than the one being traced, it may be holding a lock in /var/lock/, and may be the cause of the traced command being hung.
  • Check for a large number of physical volumes present on the system.  A large number of physical volumes may lead to slow LVM commands.  See This content is not included.https://access.redhat.com/kb/docs/DOC-62651 for more information, including the Diagnostic Steps section for determining the impact of a large number of physical volumes on the execution time of an LVM command.
  • If the command is just taking a long time, specifying the volume group(s) of interest on the command line may improve execution time.  The reason for this is because of internal design of LVM.  If the volume group is not given, LVM must do extra work to determine all volume groups in the system.  
    • The below example shows execution time of a 'vgs' command without any volume groups listed vs one when a single volume group is listed.  In both cases, the same information is displayed.  However, qualifying the 'vgs' command with a specific volume group improves the performance from 17 seconds to 12 seconds:

                # time vgs
                  VG     #PV #LV #SN Attr   VSize VFree
                  vgtest 350   0   0 wz--n- 2.05G 2.05G
                
                real     0m17.897s
                user     0m1.168s
                sys     0m0.632s
                # time vgs vgtest
                  VG     #PV #LV #SN Attr   VSize VFree
                  vgtest 350   0   0 wz--n- 2.05G 2.05G
                
                real     0m12.522s
                user     0m0.884s
                sys     0m0.472s
      
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.