How to make sure Oracle ASM devices pointing to multipath devices and not scsi paths, sd devices when using ASMLib to manage ASM disks?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8
  • device-mapper-multipath
  • Oracle ASM using ASMLib

Issue

  • Oracle application crashes when a single path in multipath fails. The application should be unaware of underlying path failures.
  • ASM crashed and with that the Oracle DB
  • Using Device Mapper Multipathing for Oracle database, and expect Oracle LUNs to see multipath, not sd devices?
  • How to make sure Oracle ASM devices pointing to multipath devices and not scsi paths, sd devices when using ASMLib to manage ASM disks?
  • I/O's to the SAN are not shared across all the paths of multipath in Oracle application server configured with Oracleasm.

Resolution

ORACLEASM_SCANORDER should be configured to force the use of the multipath pseudo-device. Since ASM uses entries from /proc/partition, a filter would need to be set to exclude underlying paths.

  1. Edit /etc/sysconfig/oracleasm and add dm to the SCANORDER, and sd to SCANEXCLUDE as follows:
# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="dm"

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

If you are using a 3rd party MPIO package, ORACLEASM_SCANORDER should be set to the corresponding device name used.

  1. This would require that the oracleasm configuration to be updated:
# oracleasm configure
# oracleasm scandisks
  1. File /etc/sysconfig/oracleasm is soft-linked to /etc/sysconfig/oracleasm-_dev_oracleasm which is the file used by OracleASM. Verify the soft-link exists.
# ls -al /etc/sysconfig/oracleasm
lrwxrwxrwx 1 root root 39 Feb 22 15:54 /etc/sysconfig/oracleasm -> /etc/  sysconfig/oracleasm-_dev_oracleasm
  1. The oracleasm configuration file changes requires a restart of OracleASM service to take effect. This can be disruptive in a production environment.

    device-mapper: table: 253:<dm_num>: multipath: error getting device
    device-mapper: ioctl: error adding target to table
    

Note: It is recommended to schedule a reboot after setting SCANORDER and SCANEXCLUDE in /etc/sysconfig/oracleasm versus just a service restart. Normally a system reboot is not required for oracleasm to start using the multipath devices. However, in (private) RHBZ#1683606, it has been noticed that, while oracleasm was still allowed to detect single paths (before the configuration change and the restart of oracleasm) it could change the value of counters used in device structures within the kernel (block_device.bd_holders) to invalid (negative) values and make the paths appear as being in use. If this happens, restarting only oracleasm will not clear the counters and the devices will continue appearing as being in use. In this case, multipath will still be unable to add the paths to the corresponding maps until the system is rebooted. If this happens, messages similar to the following will be appearing in the system logs whenever multipath tries to add one of those paths to the corresponding map:

The problem can appear even when multipath is using the paths (i.e. the counters can be "silently" changed while the paths are in use by multipath). In such a scenario, the problem will appear in case of an outage, which will cause the paths to be removed from the maps. When the paths return, multipath will be failing to add them to the corresponding maps.

For this reason, it is recommended to schedule a reboot after setting SCANORDER and SCANEXCLUDE in /etc/sysconfig/oracleasm.

  1. Once restarted, verify the multipath device is being used, a major of 253 should be returned:
# oracleasm querydisk -d <ASM_DISK_NAME>

Root Cause

When devices were added to the DISKGROUP, the underlying sd* device was used instead of the multipath pseudo device.

The dm-* devices are intended for internal use and are not persistent. However, once the DISKGROUP is created this writes metadata to the device which ASM is then able to check the header regardless of the dm- assignment. The intention here is to force ASM to read from multipath devices.

Diagnostic Steps

  • Query the disk to obtain the major:minor number of the disk being used by the disk group:

    # /etc/init.d/oracleasm querydisk -d ASM_DATA1
    Disk "ASM_Data1" is a valid ASM disk on device [8,16]
    
  • We can see that 8:16 is the underlying sdb path, not the ASM_DATA1 multipath pseudo device. Failover would not occur with this configuration.

    ASM_DATA1 (3600500000000000001) dm-24 IBM,2107900
    [size=100G][features=1 queue_if_no_path][hwhandler=0][rw]
    \_ round-robin 0 [prio=0][active]
     \_ 3:0:1:1 sdb 8:16   [failed][faulty]
     \_ 5:0:0:1 sdc 8:32   [active][ready] 
     \_ 5:0:1:1 sdd 8:48   [active][ready] 
     \_ 3:0:0:1 sde  8:64  [failed][faulty]
    
  • This can also be see in /proc/partitions:

       8     0  142577664 sda
       8     1     514048 sda1
       8     2   24579450 sda2
       8     3   12289725 sda3
       8    16   52428800 sdb
       8    32   52428800 sdc
       8    48   52428800 sdd
       8    64   52428800 sde
    
  • The major:minor of the multipath ASM_DATA1 pseudo device would be 253:24, or dm-24. This is the device that should be used:

     253    24   52428800 dm-24
    
  • Note: To check if an oracleasm device is mapped correctly in a vmcore, please see How to map an oracleasm path in a vmcore.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.