RHV: Hosts boot with Guest LVs activated
Environment
- Red Hat Virtualization 3.5, 3.6, 4.0, 4.1 and 4.2
- Red Hat Enterprise Linux 7 Hosts
- Red Hat Virtualization Hosts (RHV-H) (RHEL 7 Based)
- Preallocated/RAW Disks with Guest PVs using the whole disks (not on partitions)
- FibreChannel or iSCSI storage
Issue
-
Guest Logical Volumes (LVs) activated when host is rebooted, leading to stale LVs.
-
On the host, LVM sees all the LVs of the storage domain(s), scans all these devices and finds any LVM metadata that exists inside a guest, if the guest's Volume Group was created inside a raw disk. Then it activates and opens the VM's LVs on the host. The
device-mapperdevices that are created on the host for the guest's LVs result in the VM's LVs not being able to be deactivated or removed, ending up with stale LVs, which can lead to several other issues, including data corruption. -
The host takes too long to boot.
-
The criteria for encountering this are:
- The storage domain is of type Block, such as FibreChannel or iSCSI. VM's disk created in this storage domain is Preallocated.
- Or a Direct LUNs
- Within the Guest, a PV, VG and at least one LV are created inside the full disk, i.e. not in a disk partition.
- Within the Guest, creating the PV on top of a partitioned device (i.e.: vda1, vdb2) avoids this problem.
- Within the Guest, creating the PV on top of a whole block device (i.e.: vda, vdb) makes the Guest susceptible to this problem.
- If a snapshot is taken that includes the preallocated disk, the resultant active volume will be
thin-provisioned. If the VG is created after the snapshot was taken, then the problem described here will not occur. If the VG was created before the first snapshot was taken, the problem will occur.
- The storage domain is of type Block, such as FibreChannel or iSCSI. VM's disk created in this storage domain is Preallocated.
Resolution
It is recommended to upgrade to 4.1 or later and setup lvm filters in the hosts. Please follow How to configure LVM filters in RHV 4.1 or 4.2 environment? to setup lvm filters and reboot the hosts.
Remediation steps for Hosts that cannot be rebooted or switched to maintenance mode include deactivating the Guest LVs that are active within the Host and also deactivating the RHV LVs in case the Guest is not running on a particular host. However please note that:
- this does not fix the problem, only provides some relief until the proper workaround above can be applied. The steps above still have to be applied.
- the Disk Volume ID LV should only be active in the Host running the VM (if any)
- Internal Guest LVs should not be active or mapped by devicemapper on any host.
The following commands are useful for cleaning up a Host affected by this. But as stated below this will not prevent it from happening again. If you are unsure on how to collect the required information to run the below commands please open a case with Red Hat Support.
dmsetup ls --tree
lvchange -an /dev/<VM VG>/<VM LV>
dmsetup remove /dev/<VM VG>/<VM LV>
dmsetup clear /dev/<VM VG>/<VM LV>
lvchange -an /dev/<Storage Domain>/<Disk Volume ID>
Note: check the Diagnostics steps to see steps on how to collect the information needed above.
Root Cause
In RHEL 7, systemd and lvm are eager to scan and activate anything on a system. This leads to activation of Guest internal LVs.
Diagnostic Steps
- vdsm skips deactivating open LV on startup:
storageRefresh::DEBUG::2016-09-12 10:33:25,177::lvm::661::Storage.LVM::(bootstrap) Skipping open lv: vg=b442d48e-0398-4cad-b9bf-992a3e663573 lv=6b34b7a5-9159-430c-9ace-5329f0cdf667
- vdsm fails to deactivate LV when VM is shut down.
jsonrpc.Executor/4::ERROR::2016-09-06 19:53:08,141::task::866::Storage.TaskManager.Task::(_setError) Task=`e3bb4a53-cbee-4f10-b6c8-b936ec8cf999`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 3274, in teardownImage
dom.deactivateImage(imgUUID)
File "/usr/share/vdsm/storage/blockSD.py", line 1037, in deactivateImage
lvm.deactivateLVs(self.sdUUID, volUUIDs)
File "/usr/share/vdsm/storage/lvm.py", line 1199, in deactivateLVs
_setLVAvailability(vgName, toDeactivate, "n")
File "/usr/share/vdsm/storage/lvm.py", line 826, in _setLVAvailability
raise error(str(e))
CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\' Logical volume b442d48e-0398-4cad-b9bf-992a3e663573/5853cdf8-7b84-487e-ab70-827bf5b00140 is used by another device.\']\\nb442d48e-0398-4cad-b9bf-992a3e663573/[\'ffd27b7d-5525-4126-9f59-5a26dedad157\', \'5853cdf8-7b84-487e-ab70-827bf5b00140\']",)',)
Example:
The following example shows a guest VG called 'vg-guest' that contains an LV called 'lv-guest'. The LV on the host that contains this is '/dev/76dfe909-20a6-4627-b6c4-7e16656e89a4/6aacc711-0ecf-4c68-b64d-990ae33a54e3'.
dmsetup ls --treeon the host;
This shows that the device 'vg--guest-lv--guest' is associated with `76dfe909--20a6--4627--b6c4--7e16656e89a4-6aacc711--0ecf--4c68--b64d--990ae33a54e3', i.e. the guest's LV and the VM's LV are associated.
# dmsetup ls --tree
......
vg--guest-lv--guest (253:52)
└─76dfe909--20a6--4627--b6c4--7e16656e89a4-6aacc711--0ecf--4c68--b64d--990ae33a54e3 (253:37)
└─360014380125989a10000400000480000 (253:8)
├─ (8:64)
└─ (8:16)
......
lvson the host;
This shows that the guest's LV is active on the host and the VM's LV is active/open.
# lvs --config 'global { use_lvmetad=0 }' | egrep `6aacc711-0ecf-4c68-b64d-990ae33a54e3|guest'
6aacc711-0ecf-4c68-b64d-990ae33a54e3 76dfe909-20a6-4627-b6c4-7e16656e89a4 -wi-ao---- 1.00g
lv-guest vg-guest -wi-a----- 4.00m
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.