How to set dev_loss_tmo and fast_io_fail_tmo persistently, using a udev rule
Environment
- Red Hat Enterprise Linux (RHEL) 6, 7
- Fibre Channel SAN storage
- Exceptions:
- For iSCSI based systems that do not use ISER, see: "How can I improve the failover time of a faulty path when using device-mapper-multipath over iSCSI in Red Hat Enterprise Linux?"
Issue
- Need to set
fast_io_fail_tmoanddev_loss_tmo - Setting must persist across reboot
Resolution
Both the fast_io_fail_tmo and dev_loss_tmo are transport layer timeouts, meaning that they are defined as working with the remote port structure of the fabric, associated with class fc_remote_ports. Since they work in regards to the state of the remote port, to pull the needed udev database information, we must target a rport.
You can target all rports or only specific ones and change the two parameters using udev rules.
-
All rports. In this rule, we'll target all of our hosts, and every viable rport behind the hosts, and set each to a
dev_loss_tmoof 10 and afast_io_fail_tmoof 5. We match all viable rports by matching the role FCP Target. Create/etc/udev/rules.d/99-tmo.rulesand include the below contents.ACTION!="add|change", GOTO="tmo_end" KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{roles}=="FCP Target", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5" LABEL="tmo_end" -
Target specific rport(s).
- Select devices of interest
- Convert device scsi address to rport address
- Lookup udev attributes of rport to obtain WWNN (node name) and WWPN (port name) of remote target port
- Create
/etc/udev/rules.d/99-tmo.rulesand include the udev rule of the following form
ACTION!="add|change", GOTO="tmo_end" KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{node_name}=="node-wwn", ATTR{port_name}=="port-wwn", \ ATTR{dev_loss_tmo}="timeout-seconds", ATTR{fast_io_fail_tmo}="timeout-seconds" # Repeat udev rules for each report port... LABEL="tmo_end"
After creating the new udev rules:
-
To apply, reload the rules and database:
#RHEL6 [root@host]# udevadm control --reload-rules #RHEL7 [root@host]# udevadm control --reload-rules -
Then trigger against the appropriate subsystem:
[root@host ~]# udevadm trigger --type=devices --action=change [root@host ~]# udevadm trigger --subsystem-match=fc_remote_ports
Note: When setting `eh_deadline` and `eh_timeout` [How to set eh_deadline and eh_timeout persistently, using a udev rule](https://access.redhat.com/solutions/3209481) can be used, and if setting `dev_loss_tmo` on a Cisco UCS system using the `fnic` driver, [Does the fnic driver have a "dev_loss_tmo" setting?](https://access.redhat.com/solutions/3164771) can be used.
Example - Target Select Remote Ports
In this example we'll target individual rports by using the node_name and the port_name of the rport.
Select devices of interest
Taking a look at one of our devices, we see it is a dm-multipath device named /dev/mapper/test_lun. There are 8 paths presented through hosts 2, 3, 4, and 5. Duplicate backend ports are provided through target port 0 and 1 ending at lun 0. In this example we're choosing devices sde 2:0:0:0 and sdh 2:0:1:0 and want the changes to be applied to the remote storage ports associated with these devices.
test_lun (wwid_omitted) dm-4 NETAPP,LUN
size=30G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 2:0:0:0 sde 8:64 active ready running <<<<<<<<<<<<<<<
| |- 3:0:1:0 sdn 8:208 active ready running
| |- 4:0:0:0 sdq 65:0 active ready running
| `- 5:0:0:0 sdw 65:96 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
|- 2:0:1:0 sdh 8:112 active ready running <<<<<<<<<<<<<<<
|- 3:0:0:0 sdm 8:192 active ready running
|- 4:0:1:0 sdt 65:48 active ready running
`- 5:0:1:0 sdad 65:208 active ready running
**Convert device scsi address to rport address**
In many, but not all cases, the scsi target id (index) 'T' within the scsi H:B:T:L device address is also the assigned report port index value. Using the selected scsi host number (H) in 'rport-2*' string within in the following command, look up the assigned 'T' scsi target id (index) assigned to remote ports on host 2. Notice in this case the assigned scsi target id is also the remote port index (the '1' in 'rport-2:0-1 is the remote port index).
$ grep -Hv "zz" /sys/class/fc_remote_ports/rport-2*/scsi_target_id
/sys/class/fc_remote_ports/rport-2:0-0/scsi_target_id:0
/sys/class/fc_remote_ports/rport-2:0-1/scsi_target_id:1 << scsi 'T' target id is 1,
^
+------------------------<< the report port index is also 1
/sys/class/fc_remote_ports/rport-2:0-2/scsi_target_id:-1
**Lookup udev attributes of rport to obtain WWNN (node name) and WWPN (port name) of remote target port**
Using the identified rports above, pull the udev information using the udevadm info command.
[root@host ~]# udevadm info --attribute-walk --path=/sys/class/fc_remote_ports/rport-2\:0-0/
looking at device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0/fc_remote_ports/rport-2:0-0':
KERNEL=="rport-2:0-0"
SUBSYSTEM=="fc_remote_ports"
DRIVER==""
ATTR{supported_classes}=="Class 3"
ATTR{dev_loss_tmo}=="30"
ATTR{node_name}=="0x500a09808607eec3" << WWNN, world wide node name
ATTR{port_name}=="0x500a09819607eec3" << WWPN, world wide port name
ATTR{port_id}=="0x610400"
ATTR{roles}=="FCP Target"
ATTR{port_state}=="Online"
ATTR{scsi_target_id}=="0"
ATTR{fast_io_fail_tmo}=="5"
looking at parent device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0':
KERNELS=="rport-2:0-0"
SUBSYSTEMS==""
DRIVERS==""
[ ... snip ... ]
[root@host ~]# udevadm info --attribute-walk --path=/sys/class/fc_remote_ports/rport-2\:0-1/
looking at device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-1/fc_remote_ports/rport-2:0-1':
KERNEL=="rport-2:0-1"
SUBSYSTEM=="fc_remote_ports"
DRIVER==""
ATTR{supported_classes}=="Class 3"
ATTR{dev_loss_tmo}=="2147483647"
ATTR{node_name}=="0x500a09808607eec3" << WWNN, world wide node name
ATTR{port_name}=="0x500a09828607eec3" << WWPN, world wide port name
ATTR{port_id}=="0x610500"
ATTR{roles}=="FCP Target"
ATTR{port_state}=="Online"
ATTR{scsi_target_id}=="1"
ATTR{fast_io_fail_tmo}=="5"
looking at parent device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0':
KERNELS=="rport-2:0-1"
SUBSYSTEMS==""
DRIVERS==""
[ ... snip ... ]
**Create `/etc/udev/rules.d/99-tmo.rules` and include the udev rule of the following form**
From this information we can build a udev rule to set both dev_loss_tmo and fast_io_fail_tmo to be used for the identified remote port.
In the second example, we'll target individual rports, using the node_name and the port_name of the rport.
ACTION!="add|change", GOTO="tmo_end"
KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{node_name}=="0x500a09808607eec3", ATTR{port_name}=="0x500a09819607eec3", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5"
KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{node_name}=="0x500a09808607eec3", ATTR{port_name}=="0x500a09828607eec3", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5"
LABEL="tmo_end"
Also, see the following on additional details for shortening timeout failover to surviving paths in a fibre channel environment:
- How to set dev_loss_tmo and fast_io_fail_tmo persistently, using a udev rule
- Is there a way to limit multipath failover times in order to avoid Oracle RAC cluster evictions?
- Multipath is not detecting path failures fast enough which results in application failure and system reboots
To lengthen timeout failure to help prevent filesystems entering read-only mode:
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.