How can I diagnose the cause of scsi reservation conflicts in a RHEL cluster using fence_scsi?

Solution Verified - Updated 14 Sept 2025

Environment

Red Hat Enterprise Linux 5 (with the High Availability or Resilient Storage Add On)
Red Hat Enterprise Linux 6 (with the High Availability or Resilient Storage Add On)
Red Hat Enterprise Linux 7 (with the High Availability or Resilient Storage Add On)
Red Hat Enterprise Linux 8 (with the High Availability or Resilient Storage Add On)
Red Hat High Availability Cluster with 2 or more nodes.
One or more cluster nodes configured to use fence_scsi in /etc/cluster/cluster.conf:

$ grep fence_scsi /etc/cluster/cluster.conf
           <fencedevice agent="fence_scsi" name="myfencing"/>

Issue

Cluster node logs "scsi reservation conflict" during bootup and shared storage is inaccessible or GFS/GFS2 file system withdraws:

kernel: sd 4:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 4:0:0:9: reservation conflict
kernel: sd 4:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: fatal: I/O error
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0:   block = 86382275
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0:   function = gfs_logbh_wait
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0:   file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 816
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0:   time = 1359814274
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: about to withdraw from the cluster
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: telling LM to withdraw
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: withdrawn

The cluster nodes are reporting reservation conflicts errors:

kernel: sd 2:0:5:0: reservation conflict
kernel: sd 2:0:5:0: SCSI error: return code = 0x00000018
kernel: end_request: I/O error, dev sdq, sector 423224328
kernel: device-mapper: multipath: Failing path 65:0.
kernel: sd 2:0:5:0: reservation conflict
kernel: sd 2:0:5:0: SCSI error: return code = 0x00000018
kernel: end_request: I/O error, dev sdq, sector 423224336

GFS mount issue on 2 node RH cluster. Mounting GFS filesystems:

/sbin/mount.gfs: error mounting /dev/mapper/SMC_VG-smcdata on /isg/smc: No such file or directory

Resolution

If a node has been fenced, it will continue to receive SCSI reservation conflicts and be unable to access shared storage devices until it is rebooted and starts scsi_reserve (RHEL 5) or is "unfenced" (RHEL 6 or later) as it rejoins the cluster.
If the errors in question are happening during boot, make sure each node has a correct fence_scsi configuration.

Red Hat Enterprise Linux 5

Ensure scsi_reserve is executed after the failed node is rebooted:
- If starting cluster services on boot, ensure that scsi_reserve is chkconfig'd on:
```
# chkconfig scsi_reserve on
```
- Or if manually starting cluster services, start it after cman and before clvmd:
```
# service cman start
# service scsi_reserve start
# service clvmd start
```

Red Hat Enterprise Linux 6 (cman based cluster without pacemaker)

Ensure each node has a proper "unfence" configuration in /etc/cluster/cluster.conf.

Red Hat Enterprise Linux 6 or later (pacemaker cluster)

Ensure fence_scsi is properly configured.

Root Cause

Fence_scsi uses "Write Exclusive, Registrants Only" type scsi reservations.
- A reservation key will be added to the device first to set the reservation mode. Effectively, it means that only registered hosts are allowed to write to the device (but everyone can read from it).
- Once there is a reservation, every node will create a registration for every path to the device (there may be multiple registration keys per node in case of a multipath device).
When a cluster node initiates a fence_scsi event:
- The registration keys of the failed node will be removed from the device by the fencing node.
- As a result, the fenced node will not have a registration, and will be denied write access by the "Write Exclusive, Registrants Only" reservation type.
A fenced node should be rebooted first, and then should re-register (RHEL5) or unfence (RHEL6 or later) itself before it can write to the shared storage again.
- In RHEL5, re-registration occurs when the scsi_reserve script is run (it should be executed immediately after cman and clvmd have started)
- In RHEL6, unfencing should occur when cman starts and joins the fence domain, assuming that unfencing has been configured.
If re-registration or unfencing do not occur, the failed node will not be able to write to the shared device, even after rebooting.

More information on SCSI Reservations:

Diagnostic Steps

If the following symptoms are present on your system, then this solution may apply:

When the system boots, the filesystem (GFS in this example) fails to mount:

Mounting GFS filesystems:  /sbin/mount.gfs: error mounting /dev/mapper/SMC_VG-smcdata on /isg/smc:
 No such file or directory

The following logs are shown when the filesystem fails to mount:

kernel: Lock_Nolock (built Sep 22 2010 10:18:25) installed
kernel: Trying to join cluster "lock_nolock", "mycluster:datafs2"       [1]
kernel: Joined cluster. Now mounting FS...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Trying to acquire journal lock...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Looking at journal...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Acquiring the transaction lock...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Replaying journal...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Replayed 1 of 4 blocks
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: replays = 1, skips = 0, sames = 3
kernel: sd 4:0:0:170: reservation conflict                              [2]
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 4:0:0:170: reservation conflict
kernel: sd 4:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: Buffer I/O error on device dm-5, logical block 239170602        [3]
kernel: lost page write due to I/O error on dm-5
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: fatal: I/O error              [4]
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0:   block = 239170602
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0:   function = gfs_replay_wait
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0:   file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 928
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0:   time = 1368572952
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: about to withdraw from the cluster
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: telling LM to withdraw
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: withdrawn                     [5]
kernel: 
kernel: Call Trace:
kernel: [<ffffffff88a000a0>] :gfs:gfs_lm_withdraw+0xd4/0x101
kernel: [<ffffffff800bff65>] delayacct_end+0x5d/0x86
kernel: [<ffffffff80063a16>] __wait_on_bit+0x60/0x6e
kernel: [<ffffffff8001559a>] sync_buffer+0x0/0x3f
kernel: [<ffffffff80063a90>] out_of_line_wait_on_bit+0x6c/0x78
kernel: [<ffffffff88a17f8f>] :gfs:gfs_io_error_bh_i+0x32/0x37
kernel: [<ffffffff889edf33>] :gfs:gfs_replay_wait+0x16a/0x18e
kernel: [<ffffffff88a12710>] :gfs:gfs_recover_journal+0x263/0x37d
kernel: [<ffffffff889f6d45>] :gfs:gfs_glock_nq+0x3aa/0x3ea
kernel: [<ffffffff889f83a7>] :gfs:gfs_glock_nq_num+0x3b/0x90
kernel: [<ffffffff88a0bd02>] :gfs:fill_super+0x0/0x642
kernel: [<ffffffff88a0bd02>] :gfs:fill_super+0x0/0x642
kernel: [<ffffffff88a0b6bd>] :gfs:init_journal+0x1d0/0x34c
kernel: [<ffffffff88a0c190>] :gfs:fill_super+0x48e/0x642
kernel: [<ffffffff800e6c8a>] get_sb_bdev+0x10a/0x16c
kernel: [<ffffffff801305cb>] selinux_sb_copy_data+0x1a1/0x1c5
kernel: [<ffffffff800e6627>] vfs_kern_mount+0x93/0x11a
kernel: [<ffffffff800e66f0>] do_kern_mount+0x36/0x4d
kernel: [<ffffffff800f0fa1>] do_mount+0x6a9/0x719
kernel: [<ffffffff8002395d>] __pagevec_free+0x21/0x2e
kernel: [<ffffffff8000b26f>] release_pages+0x14d/0x15a
kernel: [<ffffffff80007691>] find_get_page+0x21/0x51
kernel: [<ffffffff80013987>] filemap_nopage+0x193/0x360
kernel: [<ffffffff800ce783>] zone_statistics+0x3e/0x6d
kernel: [<ffffffff8005c3b3>] cache_alloc_refill+0x106/0x186
kernel: [<ffffffff800ce783>] zone_statistics+0x3e/0x6d
kernel: [<ffffffff8000f41e>] __alloc_pages+0x78/0x308
kernel: [<ffffffff8004c7b8>] sys_mount+0x8a/0xcd
kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
kernel: 
kernel: GFS: fsid=mycluster:datafs2.0: jid=0: Failed
kernel: GFS: fsid=mycluster:datafs2.0: error recovering journal 0: -5

From the log above:
- [1] Begin trying to mount the filesystem.
- [2] Some reservation conflict messages appear, indicating we are not registered on the device or have been fenced.
- [3] An I/O is lost, which indicates a filesystem error and triggers a filesystem withdraw.
- [4] Filesystem detects the I/O error, and begins to withdraw the filesystem.
- [5] Filesystem is withdrawn.

Check the scsi reservation status on all the cluster nodes:

Node 1:

    # sg_persist --in -k -d /dev/mapper/wcmsshared
      IBM       2145              0000
      Peripheral device type: disk
      PR generation=0x1b92, 32 registered reservation keys follow:
        0x63b40001
        0x63b40001
        0x63b40001
        0x63b40001
        0x63b40004
        0x63b40004
        0x63b40003
        0x63b40003
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40002
        0x63b40003
        0x63b40003
        0x63b40004
        0x63b40004
        0x63b40001
        0x63b40001
        0x63b40001
        0x63b40001

    # sg_persist --in -r -d /dev/mapper/wcmsshared
      IBM       2145              0000
      Peripheral device type: disk
      PR generation=0x1b92, Reservation follows:
        Key=0x63b40002
        scope: LU_SCOPE,  type: Write Exclusive, registrants only

Node 2:

#  sg_persist --in -k -d /dev/mapper/wcmsshared
  IBM       2145              0000
  Peripheral device type: disk
  PR generation=0x1b92, 32 registered reservation keys follow:
    0x63b40001
    0x63b40001
    0x63b40001
    0x63b40001
    0x63b40004
    0x63b40004
    0x63b40003
    0x63b40003
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40002
    0x63b40003
    0x63b40003
    0x63b40004
    0x63b40004
    0x63b40001
    0x63b40001
    0x63b40001
    0x63b40001

#  sg_persist --in -r -d /dev/mapper/wcmsshared
  IBM       2145              0000
  Peripheral device type: disk
  PR generation=0x1b92, Reservation follows:
    Key=0x63b40002
    scope: LU_SCOPE,  type: Write Exclusive, registrants only

For a RHEL5 host, check if the scsi_reserve service is started on boot:

# service scsi_reserve status
No registered devices found.
# service scsi_reserve restart
Restarting scsi_reserve:                                   [  OK  ]
# service scsi_reserve status
Found 1 registered device(s):
/dev/dm-8

More information on SCSI Reservations:

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cman

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.