How to rescan the SCSI bus to add or remove a SCSI device without rebooting the computer

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 5.0 or above
    • SCSI devices over a Fibre Channel or iSCSI transport

Technical support for online storage reconfiguration is provided on Red Hat Enterprise Linux 5 and above. Limited tools for hot adding and removing storage are present in previous releases of Red Hat Enterprise Linux, however, they cannot be guaranteed to work correctly in all configurations.  Red Hat Enterprise Linux 5 includes many enhancements to udev, the low level device drivers, SCSI midlayer, and device-mapper multipath, which enables comprehensive support for online storage reconfiguration.

This article, and:

...currently cover the FC and iSCSI transports. Future versions of this documentation will cover other SCSI transports, such as SAS and FCoE.

Hewlett-Packard Smart Array controllers that use the cciss driver provide a different interface for manipulating its devices. This is because the cciss driver is a block driver, not a scsi driver and its disks are not SCSI devices. Users of this hardware can see How do I rescan an HP Smart Array for new devices without rebooting? for additional information.

The procedures below also apply to hypervisors (i.e. "dom0" in Red Hat Enterprise Linux 5 virtualization), but the procedures are different for dynamically altering the storage of running virtual guests. For more information about adding storage to virtual guests, see the Virtualization Guide for additional information.

Issue

  • What is recommended procedure for removing a disk from the system?
  • I need to remove a disk on a running system, what are the required steps to do so to prevent needing a reboot
  • Is it possible to add or remove a SCSI device without rebooting a running system?
  • Can you scan a SCSI bus for new or missing SCSI devices without rebooting?
  • How can I make newly connected SCSI storage devices available without rebooting?
  • What is the Linux equivalent to the Solaris command devfsadm to add or remove storage devices?
  • I am trying to add a LUN to a live system but it is not recognized
  • I am trying to add a tape drive to a live system but it is not recognized
  • I am trying to add a disk drive to a live system but it is not recognized
  • How can I force a rescan of my SAN to find newly associated LUNs?
  • What to do if a newly allocated LUN on my SAN is not available?
  • Unable to probe for a newly allocated LUN
  • Some nodes can't see my new storage device, how can I make it available?
  • After SAN maintenance activity, not all devices returned - devices in multipath missing or remain in failed state.
  • After SAN failover testing completed, not all devices returned to running state as expected - devices in multipath missing or remain in failed state.
  • What is the best way to remove a SCSI disk from the system
  • What is recommended procedure for removing a disk from the system

Resolution

Yes, as of Red Hat Enterprise Linux 5.0, it is possible to make changes to the SCSI I/O subsystem without rebooting. There are a number of methods that can be used to accomplish this. Some perform changes explicitly, one device at a time, or one bus at a time. Others are potentially more disruptive, causing bus resets, or potentially causing a large number of configuration changes at the same time. If the less-disruptive methods are used, then it is not necessary to pause I/O while the change is being made. If one of the more disruptive methods are used then, as a precaution, it is necessary to pause I/O on each of the SCSI buses that are involved in the change.

This article is a brief summary of the information contained in the Red Hat Enterprise Linux manuals. For Red Hat Enterprise Linux 5 refer to Online Storage Reconfiguration Guide. For Red Hat Enterprise Linux 6 refer to Storage Administration Guide. For Red Hat Enterprise Linux 7 refer to Storage Administration Guide. For Red Hat Enterprise Linux 8 refer to Managing storage devices. And for Red Hat Enterprise 9 refer to Managing storage devices. You must refer to these documents for complete coverage of this topic.


Removing a Storage Device

Before removing access to the storage device itself, you may want to copy data from the device. When that is done, then you must stop and flush all I/O, and remove all operating system references to the device, as described below.  If this is a multipath device then you must do this for the multipath pseudo device, and each of the identifiers that represent a path to the device.

Removal of a storage device is not recommended when the system is under memory pressure, since the I/O flush will add to the load. To determine the level of memory pressure run the command:

# vmstat 1 100

Device removal is not recommended if swapping is active (non-zero "si" and "so" columns in the vmstat output), and free memory is less than 5% of the total memory in more than 10 samples per 100.  (The total memory can be obtained with the "free" command.)

The general procedure for removing all access to a device is as follows:

  1. Close all users (applications, etc.) of the device. Copy (backup) data from the device, as needed.
  2. Use umount to unmount any file systems that mounted the device.
  3. Remove the device from any md and LVM volume that is using it.
    • If the device is a member of an LVM Volume group, then it may be necessary to move data off the device using the pvmove command, then use the vgreduce command to remove the physical volume, and (optionally) pvremove to remove the LVM metadata from the disk.

  4. If you are removing a multipath device, run multipath -l and take note of all the paths to the device. When this has been done, use the command below to remove the multipath device where multipath-device name may be mpath0, for example.
    # multipath -f multipath-device
  5. Use the following command to flush any pending or outstanding I/O to all paths to the device. This is particularly important for raw devices, where there is no umount or vgreduce operation to cause an I/O flush.
    # blockdev --flushbufs device
  6. Remove any reference to the device's path-based name, like /dev/sd or /dev/disk/by-path or the major:minor number, in applications, scripts, or utilities on the system. This is important to ensure that a different device, when added in the future, will not be mistaken for the current device.
  7. The final step is to remove each path to the device from the SCSI subsystem. Use the command below, or any of its variations that follow, to remove a path where device-name may be sde, for example. If a multipath device is being removed, perform this step for each path of the multipath device. Only the command below, or one of its variants below need be used to delete any specific scsi device. See FootNote 1 for further details.
    # echo 1 >  /sys/block/device-name/device/delete

    • Other variations of the same operation utilize different identifers to reference and delete the same device
      variant 1 -- # echo 1 >  /sys/class/scsi_device/h:c:t:l/device/delete
      variant 2 -- # echo 1 >  /sys/class/scsi_generic/sg-name/device/delete

      Variant 1 -- use class/scsi_device/h:c:t:l (scsi address) to reference the device instead of block/device-name. This command just uses class/scsi_device/h:c:t:l to reference the device instead of block/sdX, but is still referencing the same device. The h is the HBA number, c is the channel on the HBA, t is the SCSI target ID, and l is the LUN. You can determine the device-name and the h:c:t:l for a device from various commands, such as lsscsi, scsi_id, multipath -l, and ls -l /dev/disk/by-*

       

      Variant 2 -- use class/scsi_generic/sgN where sgN may be sg18, for example. The output of lsscsi -g can show the scsi generic names for different devices, along with the corresponding scsi addresses.

If each of the steps above are followed, then a device can safely be removed from a running system. It is not necessary to stop I/O to other devices while this is done.

Other procedures, such as the physical removal of the device, followed by a rescan of the SCSI bus using rescan-scsi-bus or issue_lip to cause the operating system state to be updated to reflect the change, are not recommended. This may cause delays due to I/O timeouts, and devices may be removed/replaced unexpectedly. If it is necessary to perform a rescan of an interconnect, it must be done while I/O is paused. Refer to Online Storage Reconfiguration Guide and Storage Administration Guide for more information.

As a point of clarification: The "physical removal" of a device referenced above means that rescan will end up pausing and going through recovery efforts up to and including resetting the adapter, which ends up causing the same side effects that issue_lip does. This is because the device is still known to the host. The proper procedure for removal would be to remove the device from the operating system first before physically removing the device in storage to avoid error recovery kicking in on the non-responsive device, etc.

Footnotes

  • FN.1 - In step 7 there are three variants of the delete device command. Only 1 needs to be performed as all are equivalent. They all reference the same device. For example, the following output shows three different identifiers for the same device, a scsi h:b:t:l address, a scsi disk sdX name, and a scsi generic sgN name. Any one of these identifier forms can be used to delete device sda.
    $ lsscsi -g
    [0:1:0:0]    disk    HP       LOGICAL VOLUME   5.70  /dev/sda   /dev/sg1 
    In this case `block/sda`, `class/scsi_device/0:1:0:0` and `class/scsi_generic/sg1` all refer to the same device. The different command syntax variations just uses a different identifier for the same device. Therefore any one of the following commands will delete the sda device -- and only one need be performed:
    
    # echo 1 >  /sys/block/sda/device/delete  or
    # echo 1 >  /sys/class/scsi_device/0:1:0:0/device/delete  or
    # echo 1 > /sys/class/scsi_generic/sg1/device/delete

### Adding a Storage Device or a Path

When adding a device, be aware that the path-based device name (the “sd” name, the major:minor number, and /dev/disk/by-path name, for example) that the system assigns to the new device may have been previously in use by a device that has since been removed. Ensure that all old references to the path-based device name have been removed. Otherwise the new device may be mistaken for the old device.

The first step is to physically enable access to the new storage device, or a new path to an existing device. This may involve installing cables, disks, and vendor-specific commands at the FC or iSCSI storage server. When you do this, take note of the LUN value for the new storage that will be presented to your host.

Next, make the operating system aware of the new storage device, or path to an existing device. The preferred command is:

# echo "c t l" >  /sys/class/scsi_host/hostH/scan

where H is the HBA number, c is the channel on the HBA, t is the SCSI target ID, and l is the LUN.

See How do you map between scsi address <0 0 0 0> (or 0:0:0:0) and scsi device name (sda)? for information on scsi h:c:t:l addressing and mapping between sdN names and scsi h:c:t:l addresses

You can determine the H,c,t by referring to another device that is already configured on the same path as the new device. This can be done with commands such as lsscsi, scsi_id, multipath -l, and ls -l /dev/disk/by-*. This information, plus the LUN number of the new device, can be used as shown above to probe and configure that path to the new device.

Note: In some Fibre Channel hardware configurations, when a new LUN is created on the RAID array it may not be visible to the operating system until after a LIP (Loop Initialization Protocol) operation is performed. Refer to the manuals for instructions on how to do this. If a LIP is required, it will be necessary to stop I/O while this operation is done.

As of Red Hat Enterprise Linux 5.6, it is also possible to use the wildcard character "-" in place of c, t and/or l in the command shown above. In this case, it is not necessary to stop I/O  while this command executes. In versions prior to 5.6, the use of wildcards  in this command requires that I/O be paused as a precaution.

After adding all the SCSI paths to the device, execute the multipath command, and check to see that the device has been properly configured. At this point, the device is available to be added to md, LVM, mkfs, or mount, for example.

Other commands, that cause a SCSI bus reset, LIP, or a system-wide rescan, that may result in multiple add/remove/replace operations, are not recommended. If these commands are used, then I/O to the affected SCSI buses must be paused and flushed prior to the operation. Refer to the [Online Storage Reconfiguration Guide](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/index and Storage Administration Guide for more information.

As of release 5.4, a script called /usr/bin/rescan-scsi-bus.sh is available as part of the sg3_utils package. This can make rescan operations easier. Additionally, cleanup of unused devices can be made easier by using the -r flag which enables device removal. This script is described in the manuals mentioned above ([RHEL5] "Online Storage Reconfiguration Guide - rescan-scsi-bus", [RHEL6] "Storage Administration Guilde - logical unit add remove")

Diagnostic Steps

If scanning for new devices doesn't work -- that is the expected lun or luns were not seen and new devices were not created -- then obtaining the output from sg_luns from the sg3_utils package may help understand the issue.

There are several common results after scanning for a newly added lun:

  1. new lun added within storage and is seen/discovered by a scan as expected,
  2. new lun added to existing list of luns within storage (lun 9 is added to lun 0-8 for example), is not seen by host, and no errors are logged,
  3. new lun added to existing list of luns within storage, is not seen by host, but errors are logged during scanning such as:
    scsi: host 0 channel 0 id 2 lun16643 has a LUN larger than allowed by the host adapter,
  4. new lun added to newly zoned/presented storage port (lun 0 is presented for example) and is not seen.

new lun added to existing list of luns, is not seen, no errors logged
In the case where the new lun isn't seen and no errors are being logged, typically the cause is the lun isn't being returned by storage. The list of luns behind a storage port is obtained by the kernel by sending a scsi REPORT LUNS command to a device on the storage port (the target rather than the lun answers this command request, so all luns behind a storage target port will return the same lun list). We can obtain the same list from storage using the sg_luns command.

For example, we have two HBAs connected to two storage ports each with the following luns.

$ lsscsi
[0:0:0:0]    disk    XYZ      TT3250310AS      N4.A  /dev/sda
[0:0:0:1]    disk    XYZ      TT3250311AS      N4.A  /dev/sdb
[0:0:1:0]    disk    XYZ      TT3250312AS      N4.A  /dev/sdc
[1:0:0:0]    disk    XYZ      TT3250315AS      N4.A  /dev/sdd
[1:0:1:0]    disk    XYZ      TT3250314AS      N4.A  /dev/sde

We've added and presented lun '2' to this host within storage. It is expected the new lun will show up as [0:0:0:2] after a scan but no new devices appear. To see the same information the kernel sees during a lun rescan we can run the following commands so that each of the storage ports ([0:0:0:*] [0:0:1:*] [1:0:0:*] and [1:0:1:*]) are checked. This way, if we don't know specifically which storage port the new lun is supposed to show up behind, we've covered all of the possibilities.

$ sg_luns -d -vv /dev/sda
$ sg_luns -d -vv /dev/sdc
$ sg_luns -d -vv /dev/sdd
$ sg_luns -d -vv /dev/sde

The output from the above commands is examined to see if lun 2 has shown up as expected. For example if we expect the new lun to show up as 0:0:0:2, we'd query another device with the same 0:0:0:* scsi address. Such devices all are behind the same storage target. In this example, either sda or sdb can be used.

$ sg_luns -d -vv /dev/sda
:
                 
Report luns [select_report=0]:
    0000000000000000                        
      Peripheral device addressing: lun=0
    0001000000000000                        
      Peripheral device addressing: lun=1

In the above case, the expected lun 2 was not returned by storage so the host can't see it during discovered. Check to make sure all the storage vendor specific steps needed to present the lun from storage to our host hba were followed.

new lun added to existing list of luns, is not seen, but errors are logged
If the error message has to do with lun number being too big, but the lun added shouldn't have a lun id greater than 255, then the likely problem is that storage is returning the lun numbers in a format other the 00b. For more information see: Red Hat Enterprise Linux reports lunXXX has a LUN larger than allowed by the host adapter, where XXX is a very large number.

In this case the sg_luns output might include 01b format luns which aren't supported by the linux kernel. This results in the incorrect decode of the lun number. Storage configuration needs to be changed to return the luns within the correct 00b format.

$ sg_luns -d -vv /dev/sda
:
                 
Report luns [select_report=0]:
    4000000000000000                        01b format
      Flat space addressing: lun=0
    4001000000000000                        01b format
      Flat space addressing: lun=1

Note in this case the lun (number) access method is 01b or the "flat space addressing" vs the 00b lun access method ("Peripheral device addressing") as seen in the previous example. Linux currently only supports the 00b lun access method. See the T10 "SCSI Architecture Model" specification for more information on lun number formats/access methods.

new lun added to newly zoned/presented storage port
The normal methods for scanning new luns won't work until the new port is discovered. Scanning only scans existing storage ports for new luns, it doesn't scan for new ports. Refer to Online Storage Reconfiguration Guide and Storage Administration Guide for more information with regards to issue_lip and scanning for interconnect port changes.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.