Configuring device-mapper-multipath for an EMC DGC storage configured in ALUA mode with RHEL 5

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 5 Update 4 or later
  • EMC CLARiiON storage configured in active-active ALUA mode
  • device-mapper-multipath

Issue

  • How to configure multipath for EMC CLARiiON and VNX for ALUA
  • Multipath issues on EMC CLARiiON or VNX storage
  • An EMC DGC storage device has been configured to use ALUA mode, and device-mapper-multipath on the host must be configured to use explicit ALUA mode
  • My EMC DGC storage SAN paths are configured in active-active ALUA mode, what configuration do I need to apply to device-mapper-multipath?
  • My EMC Clariion SAN paths are configured in active-active ALUA mode, what configuration do I need to apply to device-mapper-multipath?
  • If I try to access paths (/dev/sdN) directly, the disks in the lower priority group will fail. Multipath device access still works OK when disabling the paths in the high priority group.
  • Multipath displaying "kernel: Buffer I/O error on device sdX, logical block 0" with EMC CLARiiON SAN
  • My system mounts LUNs on an EMC Clariion SAN. I see entries in /var/log/messages stating
kernel: Buffer I/O error on device sdX, logical block 0
  • Storage LUNs automatically disappear when loaded.

Resolution

Add and/or update the devices clause for DGC (CLARiiON) arrays within multipath.conf[1]. Please see the information within the latest available "EMC Host Connectivity Guide for Linux" from EMC. Failing to following EMC guidelines can result in the storage configuration being unsupported by EMC.

As an example, the following is the defined clause from "EMC Host Connectivity Guide for Linux", Rev A36 dated Jan 2014 pgs 250/251 under "ALUA RHEL 5/ RHEL6" section:


devices {
:
.
# Device attributed for EMC CLARiiON and VNX series ALUA
  device {
    vendor "DGC"
    product ".*"
    prio_callout "/sbin/mpath_prio_alua /dev/%n"
    path_grouping_policy group_by_prio
    features "1 queue_if_no_path"
    failback immediate
    hardware_handler "1 alua"
    path_checker readsector0
  }
:
.
}
NOTE:The above items in red are the sections that were changed between active/standby (pnr) and active/active (alua) CLARiiON configurations. Note that the path_checker is changed from emc_clariion to just accepting the default (that is, an explicit path-checker type is not specified). The ones expected by EMC are currently either readsector0 or tur. An explicit path_checker selection has been shown above to eliminate ambiguity.
  You should update your initrd image in order to ensure changes are picked up. See Why are changes I made to /etc/multipath.conf not taking effect on boot on my boot-from-multipath RHEL system? for specific steps.   After changing multipath.conf, reboot or restart the multipath services for the changes to take effect.
# service multipathd reload

 

 

\--\--\----------------------------------------------------------------------------------------------------------- Footnotes: [1] To clarify the instructions, there are two starting states for DGC/CLARiiON storage and multipath:
  1. a customized clause is present within /etc/multipath.conf, or
  2. no customized clause is present within /etc/mulitpath.conf and the built-in default clause is being used.

In case #1, the information in /etc/multipath.conf needs to have the three highlighted fields above (prio_callout, hardware_handler, and path_checker) updated new their new desired values.

In case #2, get a copy of the built-in default clause by using the multipathd -k"show config" command, for example:

# multipathd -k"show config"
:
        device {
                vendor "DGC"
                product ".*"
                product_blacklist LUNZ
                path_grouping_policy group_by_prio
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 emc"
                prio_callout "/sbin/mpath_prio_emc /dev/%n"
                failback immediate
                rr_weight uniform
                no_path_retry 60
                rr_min_io 1000
        }

Copy that clause into the /etc/multipath.conf and change the three (bolded) fields that need to be changed.


devices {
:
        device {
                vendor "DGC"
                product ".*"
                product_blacklist LUNZ
                path_grouping_policy group_by_prio
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                path_selector "round-robin 0"
                path_checker readsector0
                features "1 queue_if_no_path"
                hardware_handler "1 alua"
                prio_callout "/sbin/mpath_prio_alua /dev/%n"
                failback immediate
                rr_weight uniform
                no_path_retry 60
                rr_min_io 1000
       }
}

The underlined fields above are somewhat redundant as they are the same value as the multipath defaults that will be used by any and all device clauses that do not specify thier own specific value. For example:

# multipathd -k"show config"
defaults {
        verbosity 2
        polling_interval 5
        udev_dir "/dev"
        path_selector "round-robin 0"
        path_grouping_policy failover
        getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout none
        features "0"
        path_checker readsector0
        failback manual
        rr_min_io 1000
        max_fds max
        rr_weight uniform
        queue_without_daemon no
        flush_on_last_del no
        user_friendly_names yes
        pg_prio_calc avg
        log_checker_err always
        bindings_file "/var/lib/multipath/bindings"
        file_timeout 90
}

NOTE: Always use multipathd -k"show config" to obtain the information on the system as different multipath versions can have variances within the defaults and compiled in DGC storage clauses.

Root Cause

The device-mapper-multipath default information within multipath.conf for EMC CLARiiON storage is setup for active-passive configurations and uses the "1 emc" hardware handler, etc.

When CLARiiON storage is configured in active-active (alua) mode the multipath.conf needs to be modified to use the correct hardware handler, path checker, and priority callout routine for same.

Asymmetric Logical Unit Access (ALUA) support in  device-mapper-multipath was This content is not included.updated in Red Hat Enterprise Linux 5.4, adding explicit ALUA support  for Clariion storage. Earlier versions of Red Hat Enterprise Linux 5  added support for implicit ALUA (i.e. the operating system is not aware  of which storage device paths have optimized performance and which have  non-optimized performance). If the operating system consistently sends  I/O on a non-optimized path, then the storage device may transparently  make that path optimized, improving performance and causing idle paths  to become non-optimized.

Red Hat Enterprise Linux 5.4 introduces explicit ALUA support for Clariion storage (i.e. the operating system exchanges  information with the storage device and is able to select the paths that  have optimized performance).

Diagnostic Steps

Reviewing multipath -ll output shows hwhandler=1 emc present for CLARiiON storage, which is for Clariion active-passive (passive not ready -- PNR) mode. Althernatively, performing a multipathd -k"show config" can be used to show what values are currently present/used within the kernel. If the "1 emc" handler continues to be present within the "show config" output even after changing multipath.conf with "1 alua", then multipath hasn't been restarted correctly.

Having paths show up as something like the following:

mpath1 (360000000000000000000000000000001) dm-2 DGC,VRAID
[size=20G][features=1 queue_if_no_path][hwhandler=1 emc][rw]
\_ round-robin 0 [prio=2][active]
\_ 0:0:0:16 sdc 8:32 [active][ready]
\_ 1:0:1:16 sdo 8:224 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 0:0:1:16 sdg 8:96 [active][ready]
\_ 1:0:0:16 sdk 8:160 [active][ready]

The calculated/presented priorities highlighted above are incorrect for active-active (alua) configuration. A priority of 0 is the nominal priority for a standby path. So having priorities present as above is an indication that multipath is configured in active-passive mode. Typical priority pairs for active-active (alua) would be something like 50 and 10.

To verify that the CLARiiON storage is configured in active/active (alua) mode, an sg_rtpg command can be performed. The sg_rtpg command is available from the optional sg3_utils package. A scsi Report Target Port Groups (RTPG) command is sent to the specified device and the returned data is decoded. This is the same command/data used within multipath for ascertaining port status within alua configurations.

# sg_rtpg -d /dev/sdN
:
    target port group asymmetric access state : 0x01 (active/non optimized)
:
    target port group asymmetric access state : 0x00 (active/optimized)
:

The full set of asymmetric access state values defined by the scsi specification are:

  • 0h Active/Optimized
  • 1h Active/Non-optimized
  • 2h Standby
  • 3h Unavailable
  • 4h-Eh Reserved
  • Fh Transitioning between states

These values will be shown within the RTPG command output. See Engineering Notes - scsi INQUIRY and REPORT TARGET PORT GROUPS commands for more information on RTPG.

Once the multipath.conf is setup correctly, the paths will show up something like the following:

mpath1 (360000000000000000000000000000001) dm-2 DGC,VRAID
[size=20G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=100][active]
\_ 0:0:0:16 sdc 8:32 [active][ready]
\_ 1:0:1:16 sdo 8:224 [active][ready]
\_ round-robin 0 [prio=50][enabled]
\_ 0:0:1:16 sdg 8:96 [active][ready]
\_ 1:0:0:16 sdk 8:160 [active][ready]

The handler is alua and the path priorities show active access via non-zero priority values. The actual priority values may be different - 50,20 or 50,10 for example, or other similar pairs of values.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.