Unfence operation fails with fence_scsi when 'devices' parameter is undefined and cluster is configured with lvmlockd in RHEL 8 or 9
Environment
- Red Hat Enterprise Linux Server 8, 9 (with the High Availability Add On)
fence_scsifence agent
Issue
- The unfence operation fails in RHEL 8/9 pacemaker cluster with
fence_scsiwhendevicesattribute is undefined with the following error:
2023-04-04 07:21:29,724 ERROR: Failed: No devices found
2023-04-04 07:21:29,724 ERROR: Please use '-h' for usage
- With pacemaker cluster configured over RHEL 7 with clvmd and using
fence_scsifence agent, thedevicesparameter can be skipped as the shared underlying devices gets auto-populated. However using the same concept in RHEL 8/9, the unfence operation fails when setup utilizeslvmlockd.
Resolution
Red Hat Enterprise Linux 8
- The issue (bugzilla bug: 2187329) has been resolved with the errata RHBA-2023:6927 with the following package(s):
fence-agents-common-4.2.1-121.el8,fence-agents-all-4.2.1-121.el8or later.
Red Hat Enterprise Linux 9
- The issue (bugzilla bug: 2187327) has been resolved with the errata RHBA-2023:6362 with the following package(s):
fence-agents-common-4.10.0-55.el9,fence-agents-all-4.10.0-55.el9or later.
Root Cause
With the RHEL 7 pacemaker cluster configured with clvm and fence_scsi, the fence agent fence_scsi can populate the underlying shared storage for performing unfence or fence operation even when the devices parameter is undefined. In RHEL 8 or RHEL 9, the configuration moved to shared VG concept (where the 6th bit of Attr is s in vgs command) and lvmlockd is used, it fails populate the devices associated with shared VG.
As per the source code, in line#295 it determines the devices with 'c' flag:
283 def get_clvm_devices(options):
284 devs = []
285 cmd = options["--vgs-path"] + " " +\
286 "--noheadings " +\
287 "--separator : " +\
288 "--sort pv_uuid " +\
289 "--options vg_attr,pv_name "+\
290 "--config 'global { locking_type = 0 } devices { preferred_names = [ \"^/dev/dm\" ] }'"
291 out = run_cmd(options, cmd)
292 if out["err"]:
293 fail_usage("Failed: Cannot get clvm devices")
294 for line in out["out"].split("\n"):
295 if 'c' in line.split(":")[0]:
296 devs.append(line.split(":")[1])
297 return devs
...
571 if not ("--devices" in options and options["--devices"].split(",")):
572 options["devices"] = get_clvm_devices(options)
573 else:
574 options["devices"] = options["--devices"].split(",")
Now that in RHEL 8/9, the clvmd has been replaced with the shared flag the Attr section for shared VGs has wz--ns instead of wz--nc. Hence the detection of the device(s) fails resulting in unfence operation failure with following error:
2023-04-04 07:21:29,724 ERROR: Failed: No devices found
2023-04-04 07:21:29,724 ERROR: Please use '-h' for usage
Diagnostic Steps
See Issue and Root Cause.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.