How can I diagnose the cause of scsi reservation conflicts in a RHEL cluster using fence_scsi?
Environment
-
Red Hat Enterprise Linux 5 (with the High Availability or Resilient Storage Add On)
-
Red Hat Enterprise Linux 6 (with the High Availability or Resilient Storage Add On)
-
Red Hat Enterprise Linux 7 (with the High Availability or Resilient Storage Add On)
-
Red Hat Enterprise Linux 8 (with the High Availability or Resilient Storage Add On)
-
Red Hat High Availability Cluster with 2 or more nodes.
-
One or more cluster nodes configured to use
fence_scsiin/etc/cluster/cluster.conf:
$ grep fence_scsi /etc/cluster/cluster.conf
<fencedevice agent="fence_scsi" name="myfencing"/>
Issue
- Cluster node logs "scsi reservation conflict" during bootup and shared storage is inaccessible or GFS/GFS2 file system withdraws:
kernel: sd 4:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 4:0:0:9: reservation conflict
kernel: sd 4:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: sd 2:0:0:9: reservation conflict
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: fatal: I/O error
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: block = 86382275
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: function = gfs_logbh_wait
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 816
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: time = 1359814274
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: about to withdraw from the cluster
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: telling LM to withdraw
kernel: GFS: fsid=isgruapp7n-prod:amgprod_gfs.0: withdrawn
- The cluster nodes are reporting reservation conflicts errors:
kernel: sd 2:0:5:0: reservation conflict
kernel: sd 2:0:5:0: SCSI error: return code = 0x00000018
kernel: end_request: I/O error, dev sdq, sector 423224328
kernel: device-mapper: multipath: Failing path 65:0.
kernel: sd 2:0:5:0: reservation conflict
kernel: sd 2:0:5:0: SCSI error: return code = 0x00000018
kernel: end_request: I/O error, dev sdq, sector 423224336
- GFS mount issue on 2 node RH cluster. Mounting GFS filesystems:
/sbin/mount.gfs: error mounting /dev/mapper/SMC_VG-smcdata on /isg/smc: No such file or directory
Resolution
- If a node has been fenced, it will continue to receive SCSI reservation conflicts and be unable to access shared storage devices until it is rebooted and starts
scsi_reserve(RHEL 5) or is "unfenced" (RHEL 6 or later) as it rejoins the cluster. - If the errors in question are happening during boot, make sure each node has a correct
fence_scsiconfiguration.
Red Hat Enterprise Linux 5
-
Ensure
scsi_reserveis executed after the failed node is rebooted:- If starting cluster services on boot, ensure that
scsi_reserveis chkconfig'd on:
# chkconfig scsi_reserve on- Or if manually starting cluster services, start it after
cmanand beforeclvmd:
# service cman start # service scsi_reserve start # service clvmd start - If starting cluster services on boot, ensure that
Red Hat Enterprise Linux 6 (cman based cluster without pacemaker)
- Ensure each node has a proper "unfence" configuration in
/etc/cluster/cluster.conf.
Red Hat Enterprise Linux 6 or later (pacemaker cluster)
- Ensure fence_scsi is properly configured.
Root Cause
-
Fence_scsi uses "Write Exclusive, Registrants Only" type scsi reservations.
- A reservation key will be added to the device first to set the reservation mode. Effectively, it means that only registered hosts are allowed to write to the device (but everyone can read from it).
- Once there is a reservation, every node will create a registration for every path to the device (there may be multiple registration keys per node in case of a multipath device).
-
When a cluster node initiates a fence_scsi event:
- The registration keys of the failed node will be removed from the device by the fencing node.
- As a result, the fenced node will not have a registration, and will be denied write access by the "Write Exclusive, Registrants Only" reservation type.
-
A fenced node should be rebooted first, and then should re-register (RHEL5) or unfence (RHEL6 or later) itself before it can write to the shared storage again.
- In RHEL5, re-registration occurs when the scsi_reserve script is run (it should be executed immediately after cman and clvmd have started)
- In RHEL6, unfencing should occur when cman starts and joins the fence domain, assuming that unfencing has been configured.
-
If re-registration or unfencing do not occur, the failed node will not be able to write to the shared device, even after rebooting.
More information on SCSI Reservations:
- Using SCSI Persistent Reservation Fencing (fence_scsi) with pacemaker in a Red Hat High Availability cluster
- Using SCSI Persistent Reservations with Red Hat Enterprise Linux 6
- Using SCSI Persistent Reservations with Red Hat Enterprise Linux 4 or 5
- How can I view, create, and remove SCSI reservations and keys?
Diagnostic Steps
If the following symptoms are present on your system, then this solution may apply:
- When the system boots, the filesystem (GFS in this example) fails to mount:
Mounting GFS filesystems: /sbin/mount.gfs: error mounting /dev/mapper/SMC_VG-smcdata on /isg/smc:
No such file or directory
- The following logs are shown when the filesystem fails to mount:
kernel: Lock_Nolock (built Sep 22 2010 10:18:25) installed
kernel: Trying to join cluster "lock_nolock", "mycluster:datafs2" [1]
kernel: Joined cluster. Now mounting FS...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Trying to acquire journal lock...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Looking at journal...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Acquiring the transaction lock...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Replaying journal...
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: Replayed 1 of 4 blocks
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: jid=0: replays = 1, skips = 0, sames = 3
kernel: sd 4:0:0:170: reservation conflict [2]
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 4:0:0:170: reservation conflict
kernel: sd 4:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: sd 2:0:0:170: reservation conflict
kernel: Buffer I/O error on device dm-5, logical block 239170602 [3]
kernel: lost page write due to I/O error on dm-5
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: fatal: I/O error [4]
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: block = 239170602
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: function = gfs_replay_wait
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 928
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: time = 1368572952
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: about to withdraw from the cluster
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: telling LM to withdraw
kernel: GFS: fsid=isgswmlu1-cl:datafs2.0: withdrawn [5]
kernel:
kernel: Call Trace:
kernel: [<ffffffff88a000a0>] :gfs:gfs_lm_withdraw+0xd4/0x101
kernel: [<ffffffff800bff65>] delayacct_end+0x5d/0x86
kernel: [<ffffffff80063a16>] __wait_on_bit+0x60/0x6e
kernel: [<ffffffff8001559a>] sync_buffer+0x0/0x3f
kernel: [<ffffffff80063a90>] out_of_line_wait_on_bit+0x6c/0x78
kernel: [<ffffffff88a17f8f>] :gfs:gfs_io_error_bh_i+0x32/0x37
kernel: [<ffffffff889edf33>] :gfs:gfs_replay_wait+0x16a/0x18e
kernel: [<ffffffff88a12710>] :gfs:gfs_recover_journal+0x263/0x37d
kernel: [<ffffffff889f6d45>] :gfs:gfs_glock_nq+0x3aa/0x3ea
kernel: [<ffffffff889f83a7>] :gfs:gfs_glock_nq_num+0x3b/0x90
kernel: [<ffffffff88a0bd02>] :gfs:fill_super+0x0/0x642
kernel: [<ffffffff88a0bd02>] :gfs:fill_super+0x0/0x642
kernel: [<ffffffff88a0b6bd>] :gfs:init_journal+0x1d0/0x34c
kernel: [<ffffffff88a0c190>] :gfs:fill_super+0x48e/0x642
kernel: [<ffffffff800e6c8a>] get_sb_bdev+0x10a/0x16c
kernel: [<ffffffff801305cb>] selinux_sb_copy_data+0x1a1/0x1c5
kernel: [<ffffffff800e6627>] vfs_kern_mount+0x93/0x11a
kernel: [<ffffffff800e66f0>] do_kern_mount+0x36/0x4d
kernel: [<ffffffff800f0fa1>] do_mount+0x6a9/0x719
kernel: [<ffffffff8002395d>] __pagevec_free+0x21/0x2e
kernel: [<ffffffff8000b26f>] release_pages+0x14d/0x15a
kernel: [<ffffffff80007691>] find_get_page+0x21/0x51
kernel: [<ffffffff80013987>] filemap_nopage+0x193/0x360
kernel: [<ffffffff800ce783>] zone_statistics+0x3e/0x6d
kernel: [<ffffffff8005c3b3>] cache_alloc_refill+0x106/0x186
kernel: [<ffffffff800ce783>] zone_statistics+0x3e/0x6d
kernel: [<ffffffff8000f41e>] __alloc_pages+0x78/0x308
kernel: [<ffffffff8004c7b8>] sys_mount+0x8a/0xcd
kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
kernel:
kernel: GFS: fsid=mycluster:datafs2.0: jid=0: Failed
kernel: GFS: fsid=mycluster:datafs2.0: error recovering journal 0: -5
-
From the log above:
- [1] Begin trying to mount the filesystem.
- [2] Some reservation conflict messages appear, indicating we are not registered on the device or have been fenced.
- [3] An I/O is lost, which indicates a filesystem error and triggers a filesystem withdraw.
- [4] Filesystem detects the I/O error, and begins to withdraw the filesystem.
- [5] Filesystem is withdrawn.
-
Check the scsi reservation status on all the cluster nodes:
Node 1:
# sg_persist --in -k -d /dev/mapper/wcmsshared IBM 2145 0000 Peripheral device type: disk PR generation=0x1b92, 32 registered reservation keys follow: 0x63b40001 0x63b40001 0x63b40001 0x63b40001 0x63b40004 0x63b40004 0x63b40003 0x63b40003 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40003 0x63b40003 0x63b40004 0x63b40004 0x63b40001 0x63b40001 0x63b40001 0x63b40001 # sg_persist --in -r -d /dev/mapper/wcmsshared IBM 2145 0000 Peripheral device type: disk PR generation=0x1b92, Reservation follows: Key=0x63b40002 scope: LU_SCOPE, type: Write Exclusive, registrants only- Node 2:
# sg_persist --in -k -d /dev/mapper/wcmsshared IBM 2145 0000 Peripheral device type: disk PR generation=0x1b92, 32 registered reservation keys follow: 0x63b40001 0x63b40001 0x63b40001 0x63b40001 0x63b40004 0x63b40004 0x63b40003 0x63b40003 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40002 0x63b40003 0x63b40003 0x63b40004 0x63b40004 0x63b40001 0x63b40001 0x63b40001 0x63b40001 # sg_persist --in -r -d /dev/mapper/wcmsshared IBM 2145 0000 Peripheral device type: disk PR generation=0x1b92, Reservation follows: Key=0x63b40002 scope: LU_SCOPE, type: Write Exclusive, registrants only -
For a RHEL5 host, check if the scsi_reserve service is started on boot:
# service scsi_reserve status
No registered devices found.
# service scsi_reserve restart
Restarting scsi_reserve: [ OK ]
# service scsi_reserve status
Found 1 registered device(s):
/dev/dm-8
More information on SCSI Reservations:
- Using SCSI Persistent Reservation Fencing (fence_scsi) with pacemaker in a Red Hat High Availability cluster
- Using SCSI Persistent Reservations with Red Hat Enterprise Linux 6
- Using SCSI Persistent Reservations with Red Hat Enterprise Linux 4 or 5
- How can I view, create, and remove SCSI reservations and keys?
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.