Is it possible to flush a multipath map when multipath -f reports that the map is in use?
Environment
- Red Hat Enterprise Linux 5 and later
- Device mapper multipath
- SAN connected storage
Issue
-
Unable to remove a multipath device in order to proceed with unmapping the LUN from the server.
-
Attempting to flush a multipath map with "multipath -f" or "multipath -F" results in "map in use":
# multipath -f mpath7 mpath7: map in use -
Unable to remove the multipath device, even after unmapping the LUN from the server.
Resolution
Note: This document focuses on how to handle the failure of multipath -f because the multipath device is busy. For more general instructions on the recommended procedure to remove a LUN, the corresponding paths and the corresponding multipath device, please refer to How to rescan the SCSI bus to add or remove a SCSI device without rebooting the computer - Removing a Storage Device
-
It is not possible to safely flush (delete) a multipath map that is in use. It is, first, necessary to identify any subsystem or process holding the multipath device open and then take steps to release the multipath map, before being able to remove it.
-
See the Diagnostic Steps section for possible tools and techniques to reveal potential holders.
Note It is not always possible to identify the holders of a multipath device from userspace. Especially in the case that the multipath device is held busy by a component in the kernel (e.g. a kernel thread or a kernel module) it may be impossible to gain visibility without crashing the server and generating a vmcore. Crashing the server will also trigger a reboot and thus clear the problem, however, this will happen at the cost of interrupting all services. -
For any subsystem or process holding the multipath device open, stop the process, or issue commands to release the multipath device.
-
Some examples of possible holders of a multipath device and the commands to release it:
-
A filesystem exists on the multipath device and is currently mounted.
- Unmount the filesystem and if it exists in /etc/fstab, remove it.
-
One or more partition mapping(s) still exists on the multipath device.
- Use
kpartx -don the multipath device to remove the device partition mapping(s).
- Use
-
The multipath device was used by LVM, and still has device mapper state in the kernel.
- Use
lvchange -an vg_name/lv_nameto deactivate any logical volume(s) associated with the multipath device. A list of logical volumes associated with the multipath device may be found by examining the output of "lvs -o +devices". - If
lvchange -an vg_name/lv_namefails, then it is necessary to review what blocks disabling the LV. In most cases, this is an indication that the LV is open and busy (e.g. used by an application or contains a mounted filesystem). - In some cases,
lvchange -an vg_name/lv_namemay be failing because of the initial checks that LVM does before trying to disable the LV map. In such cases,dmsetup remove vg_name-lv_namecan directly request disabling the LV map. Even in this case the map needs to be unused in order to be cleared.
- Use
-
If the LUN backing the multipath map has been already removed from the storage array, but the map is still in use by an application and there is still I/O queued for the multipath map, then it is likely that the application will be hanging in D state and impossible to stop, unless the queued data gets written or the application aborts writing, potentially after an I/O error.
In such a case, the LUN will need to be re-presented back to the server in order for the pending I/O to be written.
If presenting the LUN is not possible, then the data that is queued will have to be deleted (causing potential data loss) in order to allow releasing the map and flushing it. This is possible by disabling queueing with a command similar to:multipathd disablequeueing map mpathX(Where mpathX is the name of the multipath map that needs to be flushed)
-
Dangerous step which can lead to data loss: If the cause that keeps the multipath map busy is known, it is also known that there is no way to release the map and the data on the multipath map is not important and any cached data can be lost in order to allow any blocked processes continue, then it is possible to use
dmsetup remove -ffollowed bydmsetup clearon the multipath device. These commands are explaind in more detail in theman dmsetuppage.
-
-
Once all holders of the device have been removed, the multipath device can be flushed with "
multipath -f".
Root Cause
- The open count of the multipath device is not 0.
- At least one process or subsystem is keeping the multipath device busy.
Diagnostic Steps
-
Review the output of
dmsetup info -cto determine if the map is in use (open) and how many holders (processes or subsystems) keep the device busy. The open count has to be 0 in order to allow flushing the map:# dmsetup info -c | awk 'NR==1||/mpathc/' Name Maj Min Stat Open Targ Event UUID mpathc 253 6 L--w 1 1 14 mpath-3600000000000000000cccccccccccccc -
Determine if the LUN backing the multipath map is still presented to the system. This can only be confirmed by reviewing the storage array configuration. However, if the multipath device is in a state similar to the following, then this is a strong indication that the LUN is no longer mapped to the server:
# multipath -ll mpath7 mpath7 (3600000000000000000aaaaaaaaaaaaaa) dm-7 , [size=4.4T][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=0][enabled] \_ #:#:#:# - #:# [failed][faulty] # multipath -ll mpathb mpathc (3600000000000000000bbbbbbbbbbbbbb) dm-9 ##,## size=150G features='1 queue_if_no_path' hwhandler='0' wp=rw # multipath -ll mpathc mpathc (3600000000000000000cccccccccccccc) dm-6 VENDOR_ID,PRODUCT_ID size=300G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=0 status=active |- 3:0:1:3 sds 65:32 failed faulty running |- 4:0:1:3 sde 8:64 failed faulty running |- 3:0:0:3 sdk 8:160 failed faulty running `- 4:0:0:3 sdn 8:208 failed faulty running -
In order to understand what is keeping the map busy, it is necessary to gather information on how this LUN was expected to be used. For example:
- Was there a filesystem on it?
- Was there a partition on it?
- Was it used by any VMs?
- Was it used as a raw block device?
- Was it used by LVM?
- What applications were expected to be using it?
-
Depending on the way the LUN is used, different tools can provide information on what is using the multipath map. Some common examples are:
-
If the LUN is expected to be used by LVM (or by other devicemapper volumes):
- Review the output of
lsblk -sordmsetup ls --treefor devicemapper maps corresponding to LVs which are built on top of the multipath device.
- Review the output of
-
If the LUN is expected to be used by a filesystem:
-
Review the contents of /proc/mounts to understand if the filesystem is still mounted.
-
Use
lsofto reveal processes that may be keeping files open on the filesystem. -
XFS specific: In the output of
ps auxsearch for XFS processes related to the dm-X device matching the multipath map (e.g. xfsaild/dm-6). There are cases in which these processes can exist, but the filesystem will not appear in /proc/mounts. Examples include lazy unmount operations and the filesystem being mounted in containers. -
EXT4 specific: In the output of
ps auxsearch for EXT4 processes related to the dm-X device matching the multipath map (e.g. jbd2/dm-9-8). -
Search in the mountinfo and mounts file of each process in /proc for references to the map or the mountpoint of the map. This step can reveal the use of the map by processes running in containers. A relatively simple one line command for such a search is:
grep -e "mpathX" -e "dm-Y" -e <WWID> -e <mount_point> /proc/*/mountinfoWhere:
mpathXis the multipath map name,dm-Yis the correspoinding devicemapper device andis the WWID of the LUN. These 3 parameters are visible in the output of multipath -ll. The <mount_point> is the expected mountpoint for the multipath map (based on system and application configuration).
-
-
If the LUN is expected to be used as a raw block device then it is important to know which applications are expected to be using it (and also which were expected to be using it in the past) and review the state of these applications. In this case,
lsofand the symbolic links in/proc/<PID>/fdcan reveal that an application is using the multipath device, along with application specific tools. -
If the LUN is expected to be used by a VM handled by libvirt/kvm/qemu, then the logs in /var/log/libvirt/qemu/ can provide additional information. Also, determining the qemu process which is running the VM and searching for the files this process is using in
lsofor in/proc/<PID>/fdcan reveal that the VM is keeping the map busy. -
If there are partitions on the LUN, then it is necessary to ensure that the partitions are not in use (in a similar way as described above).
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.