fence_scsi device parameter update (addition/deletion) causes restart of all resources

Solution Verified - Updated 14 Jun 2024

Environment

Red Hat Enterprise Linux (RHEL) 7, 8 with the High Availability Add-On
fence_scsi

Issue

Does it possible to create an independent fence_scsi device for every new resource group that is supposed to add/remove in future?
Each time an update on the fence_scsi device parameter causes a restart of all resources.
Do we have any other options in the fence_scsi configuration that can be used to avoid cluster-wide resource restart after updating fence_scsi configuration?

node2    pengine:     info: rsc_action_digest_cmp:      Parameters to ig-scsi-fnc_start_0 on node1 changed: was 5c6b7753cebc986370b0b74dd3e45abb vs. now 6f915787eee272807dae881764428afd (reload:3.0.14) 0:0;4:3:0:3de2b23c-09d6-4f7c-8709-5ceae047e66f



node2    pengine:   notice: LogNodeActions:      * Fence (on) node1 'Device parameters changed (reload)'
node2    pengine:   notice: LogAction:   * Restart    ig-scsi-fnc     (                   node1 )   due to resource definition change
node2    pengine:   notice: LogAction:   * Restart    db1_igt_lvm     (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    bck_igt_lvm     (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    db2_igt_lvm     (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    db1_igt_fs      (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    online_igt_fs   (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    exp_igt_fs      (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    db2_igt_fs      (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    ora_igt_vip     (                   node1 )   due to required stonith
node2    pengine:   notice: LogAction:   * Restart    ora_igt_ap      (                   node1 )   due to required stonith

Resolution

As per This content is not included.Bug 1590273 this is a product expected behavior.

Root Cause

When a fence_scsi device is modified, the unfence action must be repeated, since new configuration values will be passed to it. Because all other resources are implicitly ordered after unfencing, they must first be stopped, then unfencing can proceed, then the resources can be started.

fence_scsi works by having each node register a unique key with each SCSI device. It then uses an SCSI-3 persistent reservation to ensure that only registrants can write to the device. Fencing a node is simply removing its key, which makes it no longer able to write.

To make that model work, fence_scsi has to support "unfencing", which in its case is the key registration. Whenever a node first joins the cluster, it must be unfenced (i.e. be able to write to the devices) before it can run resources.

When a fence_scsi device's configuration changes, unfencing must be reapplied to all nodes. Keep in mind that pacemaker doesn't know the details of how a fence device works, the fence agent is the abstract interface to the actual device. Pacemaker only knows that the previous unfencing may now be insufficient, and must be redone with the new parameters. Due to the usual ordering of actions, all resources must be stopped before unfencing can be done, and then they can be restarted after unfencing succeeds.

In fence_scsi's case, when you add a new device, it's easy to see that every node must register a key with that device. So, unfencing must be done. There may be resources that do not depend on that particular device and thus do not really need to be restarted, but that knowledge is not available to the cluster.

Can I add or remove devices to a fence_scsi or fence_mpath device without restarting the cluster or all the resources?

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cluster

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

fence_scsi device parameter update (addition/deletion) causes restart of all resources

Environment

Issue

Resolution

Root Cause

Related Articles