System crashed with the message `device-mapper: io: Unaligned struct io pointer`

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 7
    • kernel 3.10.0-514.16.1.el7

Issue

System crashed and following messages were reported in kernel log buffer.

[23453302.315266] blk_update_request: critical target error, dev sdd, sector 3354584576
[23453302.315298] sd 0:0:6:0: [sdj] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23453302.315299] sd 0:0:6:0: [sdj] Sense Key : Illegal Request [current] 
[23453302.315301] sd 0:0:6:0: [sdj] Add. Sense: Invalid command operation code
[23453302.315302] sd 0:0:6:0: [sdj] CDB: Write same(16) 93 08 00 00 00 00 c7 f2 e6 00 00 00 02 00 00 00
[23453302.315303] blk_update_request: critical target error, dev sdj, sector 3354584576
[23453302.585455] device-mapper: io: Unaligned struct io pointer ffff880357f89802
[23453302.585610] ------------[ cut here ]------------
[23453302.585628] kernel BUG at drivers/md/dm-io.c:94!
[23453302.585644] invalid opcode: 0000 [#1] SMP 

Resolution

The reported issue looks similar to as reported in This content is not included.BZ1461519. With this bug, a discard which is failed by a path of a mirror can cause a double completion of a bio. This can lead directly to crashes, or it can cause corruptions by things like double freeing memory. So this bug has lead to using memory freed through the double completion. Being said that, the issue mentioned with the bugzilla has been fixed with the release of errata - https://access.redhat.com/errata/RHSA-2017:1842.

As such, it is recommended to update to package kernel-3.10.0-693.el7 OR later version.

Diagnostic Steps

Kernel log buffer shows Unaligned struct io pointer was encountered.

[23453302.315261] sd 5:0:2:0: [sdd] Add. Sense: Invalid command operation code
[23453302.315264] sd 5:0:2:0: [sdd] CDB: Write same(16) 93 08 00 00 00 00 c7 f2 e6 00 00 00 02 00 00 00
[23453302.315266] blk_update_request: critical target error, dev sdd, sector 3354584576
[23453302.315298] sd 0:0:6:0: [sdj] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23453302.315299] sd 0:0:6:0: [sdj] Sense Key : Illegal Request [current] 
[23453302.315301] sd 0:0:6:0: [sdj] Add. Sense: Invalid command operation code
[23453302.315302] sd 0:0:6:0: [sdj] CDB: Write same(16) 93 08 00 00 00 00 c7 f2 e6 00 00 00 02 00 00 00
[23453302.315303] blk_update_request: critical target error, dev sdj, sector 3354584576
[23453302.585455] device-mapper: io: Unaligned struct io pointer ffff880357f89802    <-----------


hexadecimal: ffff880357f89802  
    decimal: 18446612146675030018  (-131927034521598)

18446612146675030018/64
288228314791797344

288228314791797344*64
18446612146675030016

So the io was not aligned.

System crashed as following BUG condition was encountered which checks for ALIGNMENT of the IO

 81 
 82 /*-----------------------------------------------------------------
 83  * We need to keep track of which region a bio is doing io for.
 84  * To avoid a memory allocation to store just 5 or 6 bits, we
 85  * ensure the 'struct io' pointer is aligned so enough low bits are
 86  * always zero and then combine it with the region number directly in
 87  * bi_private.
 88  *---------------------------------------------------------------*/
 89 static void store_io_and_region_in_bio(struct bio *bio, struct io *io,
 90                                        unsigned region)
 91 {
 92         if (unlikely(!IS_ALIGNED((unsigned long)io, DM_IO_MAX_REGIONS))) {
 93                 DMCRIT("Unaligned struct io pointer %p", io);
 94                 BUG();                                      <------------- crashed here
 95         }
 96 
 97         bio->bi_private = (void *)((unsigned long)io | region);
 98 }
 99 

IO pointer is set at

446 io = mempool_alloc(client->pool, GFP_NOIO);

SLAB corruption had occurred

crash> kmem ffff880357f89802
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
kmem: kmalloc-64: slab: ffffea000d5fe240 invalid freepointer: ffff880357f8
ffff88017fc07b00 kmalloc-64                64     735893    737600  11525     4k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
kmem: kmalloc-64: slab: ffffea000d5fe240 invalid freepointer: ffff880357f8
  ffffea000d5fe240  ffff880357f89000     0     64         61     3
  FREE / [ALLOCATED]
kmem: kmalloc-64: slab: ffffea000d5fe240 invalid freepointer: ffff880357f8

For the slab being used for dm_io allocations, CPU 0's kmem_cache_cpu struct for the slab contained a corrupted freelist entry.

struct kmem_cache_cpu {
  freelist = 0xffff880357f8, 
  tid = 912598889, 
  page = 0xffffea000d5fe240, 
  partial = 0xffffea00086d2780
}

The value in freelist appears shifted by 2 bytes, losing the 2 least significant bytes. This was most likely as side effect of the bad pointer ffff880357f89802 being used. With the extra 0x2 value, it offset into the intended memory by 2 bytes, shifting the freelist pointer to the invalid, shifted value. So something originally corrupted where a pointer ffff880357f89800 was stored, corrupting it with the extra 0x2 which lead to the corrupted kmem_cache_cpu freelist and the bad pointer failing the sanity check.

Checking the vmcore for possible causes, in devicemapper there was a mirror (dm-15) for a pvmove in progress.

NUMBER  NAME                   MAPPED_DEVICE       FIELDS
dm-0    rhel_ocglusteru2-root  0xffff880036a4a000  flags: 0x40      
dm-1    rhel_ocglusteru2-tmp   0xffff880427bc9000  flags: 0x40      
dm-2    rhel_ocglusteru2-vartmp 0xffff880427bc8800  flags: 0x40      
dm-3    rhel_ocglusteru2-varlogaudit 0xffff880427bc8000  flags: 0x40      
dm-4    rhel_ocglusteru2-varlog 0xffff880427bcd800  flags: 0x40      
dm-5    rhel_ocglusteru2-var   0xffff880427bce000  flags: 0x40      
dm-6    rhel_ocglusteru2-home  0xffff880427bce800  flags: 0x40      
dm-7    rhel_ocglusteru2-swap  0xffff880427bcf000  flags: 0x40      
dm-8    gfs_vg-lvol0           0xffff880427bcd000  flags: 0x40      
dm-9    gfs_vg-gfs_pool_tmeta  0xffff880036a50800  flags: 0x40      
dm-10   gfs_vg-gfs_pool_tdata  0xffff88042a9c7800  flags: 0x40      
dm-11   gfs_vg-gfs_pool-tpool  0xffff88042ab3f000  flags: 0x40      
dm-12   gfs_vg-gfs_pool        0xffff88042ab3e800  flags: 0x40      
dm-13   gfs_vg-lv_brick1       0xffff880424242000  flags: 0x40      
dm-14   gfs_vg-lvol1_pmspare   0xffff880421f5c000  flags: 0x40      
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.