Boot takes a long time with multipath SAN storage containing many paths and partitions on the LUNs

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 5
  • Connected to SAN with many LUNs and paths

Issue

  • Booting takes an hour or more, and udev takes up a lot of the time.

Resolution

  • Disabling the pam_console_apply udev rule in /etc/udev/rules.d/95-pam-console.rules reduces boot times to an acceptable level.

  • Alternately, modify the rule, adding SYSFS{type}!="0" to the criteria; which avoids firing the rule for DISK type devices.

    Note: If you are running a version of udev less than udev-095-14.29.el5 (from RHBA-2013-0091), the modification to 95-pam-console.rules may be wiped out when udev is updated.

Root Cause

* On system boot, the udev helper application, pam_console_apply, was called
for every disk on the system. This was unnecessary for example for SCSI
disks, which do not have default pam console permissions. As a consequence,
the boot process was significantly slowed down if the system contained a
large number of disks. To fix this problem, the
/etc/udev/rules.d/95-pam-console.rules file has been marked as a
configuration file and it will not be automatically updated with newer udev
versions. System administrators should now comment out the pam_console_apply
call in this file on systems that do not need non-root user access to
devices. (BZ#736475)

Diagnostic Steps

  • Existing issues earlier in RHEL5.x checked, suggested to update the following packages:
    • BZ#456447: udev vol_id probe of passive/unreadable devices causes slow bootup 
      • Installed: udev-095-14.21.el5
      • Fixed    : udev-095-14.24.el5
    • Installed: device-mapper-multipath-0.4.7-34.el5_5.6
    • BZ#579789: System hangs on boot if multipath disks are in Not Ready state.
    • Fixed    : device-mapper-multipath-0.4.7-42
    • Installed: e2fsprogs-1.39-23.el5
    • BZ#672985: blkid runs very slow with many dm devices and a large blkid.tab file
    • Fixed    : e2fsprogs-1.39-33.el5
  • After updating to the above packages, long boot times still persist, even in a test environment 
    • Occurs in the test environment too as soon as we have a huge storage allocated (with partitions).
  • Increase the kernel boot log via log_buf_len kernel parameter, as detailed in This content is not included.This content is not included.http://access.redhat.com/kb/docs/DOC-61888.
  • udev appeared to the be main cause, but this is a broad area.  The following udev rules seem to create a lot of processes and activity
  1. udev rule which fires vol_id: /etc/udev/rules.d/50-udev.rules:IMPORT{program}="/lib/udev/vol_id --export $tempnode"
  2. udev rule which fires kpartx: /etc/udev/rules.d/40-multipath.rules:PROGRAM=="/sbin/dmsetup info -c --noheadings -o name -j %M -m %m", RESULT=="?*", NAME="%k", SYMLINK="mpath/%c", RUN+="/bin/bash -c '/sbin/mpath_wait /dev/mapper/%c; /sbin/kpartx -a -p p /dev/mapper/%c
  • Various tests showed boot times increasing as the number of paths and partitions increased.
  • A breakdown of the boot time based on console activity did not pinpoint the problem.
  • kpartx was suspected as contributing to the boot time, so a wrapper script was proposed which timestamped the begin / end of kpartx.  This was not the problem in the customer environment.
  • 3rd party qla2xxx driver was installed (/lib/modules/2.6.32-279.19.1.el6.x86_64/updates/lsb-ft/extra/qla2xxx/qla2xxx.ko) and has now been removed .. WIP waiting for new sosreport & boot logs
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.