Unfence operation fails with fence_scsi when 'devices' parameter is undefined and cluster is configured with lvmlockd in RHEL 8 or 9

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Server 8, 9 (with the High Availability Add On)
  • fence_scsi fence agent

Issue

  • The unfence operation fails in RHEL 8/9 pacemaker cluster with fence_scsi when devices attribute is undefined with the following error:
2023-04-04 07:21:29,724 ERROR: Failed: No devices found
2023-04-04 07:21:29,724 ERROR: Please use '-h' for usage
  • With pacemaker cluster configured over RHEL 7 with clvmd and using fence_scsi fence agent, the devices parameter can be skipped as the shared underlying devices gets auto-populated. However using the same concept in RHEL 8/9, the unfence operation fails when setup utilizes lvmlockd.

Resolution

Red Hat Enterprise Linux 8

  • The issue (bugzilla bug: 2187329) has been resolved with the errata RHBA-2023:6927 with the following package(s): fence-agents-common-4.2.1-121.el8, fence-agents-all-4.2.1-121.el8 or later.

Red Hat Enterprise Linux 9

  • The issue (bugzilla bug: 2187327) has been resolved with the errata RHBA-2023:6362 with the following package(s): fence-agents-common-4.10.0-55.el9, fence-agents-all-4.10.0-55.el9 or later.

Root Cause

With the RHEL 7 pacemaker cluster configured with clvm and fence_scsi, the fence agent fence_scsi can populate the underlying shared storage for performing unfence or fence operation even when the devices parameter is undefined. In RHEL 8 or RHEL 9, the configuration moved to shared VG concept (where the 6th bit of Attr is s in vgs command) and lvmlockd is used, it fails populate the devices associated with shared VG.

As per the source code, in line#295 it determines the devices with 'c' flag:

   283  def get_clvm_devices(options):
   284          devs = []
   285          cmd = options["--vgs-path"] + " " +\
   286          "--noheadings " +\
   287          "--separator : " +\
   288          "--sort pv_uuid " +\
   289          "--options vg_attr,pv_name "+\
   290          "--config 'global { locking_type = 0 } devices { preferred_names = [ \"^/dev/dm\" ] }'"
   291          out = run_cmd(options, cmd)
   292          if out["err"]:
   293                  fail_usage("Failed: Cannot get clvm devices")
   294          for line in out["out"].split("\n"):
   295                  if 'c' in line.split(":")[0]:
   296                          devs.append(line.split(":")[1])
   297          return devs
...
   571          if not ("--devices" in options and options["--devices"].split(",")):
   572                  options["devices"] = get_clvm_devices(options)
   573          else:
   574                  options["devices"] = options["--devices"].split(",")

Now that in RHEL 8/9, the clvmd has been replaced with the shared flag the Attr section for shared VGs has wz--ns instead of wz--nc. Hence the detection of the device(s) fails resulting in unfence operation failure with following error:

2023-04-04 07:21:29,724 ERROR: Failed: No devices found
2023-04-04 07:21:29,724 ERROR: Please use '-h' for usage

Diagnostic Steps

See Issue and Root Cause.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.