[Troubleshooting] How do I turn on additional qla2xxx or qla4xxx driver extended logging and what logging is available?

Updated

Issue

  • how do I setup up qlogic driver 'extended_error_logging'
  • what are the steps for doing this for boot time logging
  • how can I get additional debug information from the driver
  • How to enable extended logging on QLogic HBAs?
  • How to turn-on Fiber Channel HBA extended logging on ?
  • How to setup logging level for SAN driver ?
  • How to setup logging level of HBA SAN driver to get better error messages ?

Environment

  • Red Hat Enterprise Linux (RHEL) 4, 5, 6, 7, 8, 9
  • Qlogic qla2xxx or qla4xxx drivers

Resolution


Caution! Turning on significant amounts of extended error logging (e.g. 0x7fffffff) under moderate to heavy io loads can cause soft-lock ups! This likelihood of a soft-lockup increases if a serial console is being used. Check /proc/cmdline to see if console=ttyS0 is present as this slows down logging. The debug code logs information to /var/log/messages about I/O being processed, these debug messages cause additional io which, if the /var filesystem is on a disk controlled by the qlogic driver, cause more logging followed by more io and even more logging, etc. ... to the point of essentially locking up the system in that its spending all of its time logging the debug messages. It is strongly suggested that the messages file be moved off any qlogic controlled disks to a local disk or via the network to a remote logging point to avoid this issue.

 

Selecting a $MASK value (which extended events to enable)

  • qla2xxxx

    • RHEL 4, 5, and 6.0-6.2, only values 0 or 1 supported/available:
      • off: $MASK=0
      • on : $MASK=1
    • RHEL 6.3+ (2.6.32-279.el6 and later) plus all RHEL 7 and later releases, different classes of event messages can be separately turned on or off per the table below for qla2xxx only. If unsure which event classes are needed, use 0x1e400000 which is equivalent to setting 1 in RHEL 4, 5, or 6. See driver sources for exactly what is logged within each class. The above mask value typically will not cause any excessive logging issues except on very large configurations (100s-100s of LUNs).
  • qla4xxx

    • RHEL 4, 5, and 6.0-6.2, only values 0 or 1 supported/available:
      • off: $MASK=0
      • on : $MASK=1
    • RHEL 6.3+ (2.6.32-279.el6 and later) plus all RHEL 7 and later releases, only on ($MASK=2, instead of a value of 1 previously) or off ($MASK=0).
qla2xxx driver RHEL 6.3+ (2.6.32-279.el6 and later including all RHEL 7, 8, 9 versions)
$MASK        Source
Value        Code            Description
----------   --------------  ----------------------------------------------------------------
0x00000000                   default, no logging
0x00001000   ql_dbg_tgt_tmr  Target mode TMF (Task Management Functions such as reset target)
0x00002000   ql_dbg_tgt_mgt  Target mode management 
0x00004000   ql_dbg_tgt      Target mode
0x00008000   ql_dbg_verbose  More verbosity, might not apply to all levels       
0x00010000   ql_dbg_misc     Miscellaneous, everything not covered by other categories
0x00020000   ql_dbg_buffer   Buffer/registers Dump   
0x00040000   ql_dbg_vport    Virtual port debug
0x00080000   ql_dbg_p3p      P3P Specific 
0x00100000   ql_dbg_multiq   MultiQ
0x00200000   ql_dbg_aer      AER/EEH      
0x00400000   ql_dbg_taskm    Task Management
0x00800000   ql_dbg_user     User space    
0x01000000   ql_dbg_timer    Timer routines
0x02000000   ql_dbg_async     Async events
0x04000000   ql_dbg_dpc       DPC Thread
0x08000000   ql_dbg_io        IO tracing   
0x10000000   ql_dbg_disc      Device Discovery
0x20000000   ql_dbg_mbx      Mailbox Cmnds
0x40000000   ql_dbg_init     Module Init & Probe

0x7fffffff - For enabling all logs, can be too many logs
0x1e400000 - Preferred value for capturing essential
debug information (equivalent to old
ql2xextended_error_logging=1, setting it to 1 in RHEL7,8,
and 9 results in this mask value being used)

To create the $MASK value for qla2xxx driver in RHEL 6.3 or later, perform LOGICAL OR of the individual mask values to enable more than one class of events.

Note: Target mode debug should not be required as the driver should be in initiator versus target mode within majority of RHEL installations.

  1. Enable/disable/view during run-time:
    The following path location and parameter names are for RHEL 5.6 and later, plus 6 and 7 kernels. See [NOTE] to confirm the path and parameter name on prior kernel versions.
  • qla2xxx:
    • Enable:  # echo $MASK > /sys/module/qla2xxx/parameters/ql2xextended_error_logging
    • Disable: # echo 0     > /sys/module/qla2xxx/parameters/ql2xextended_error_logging
    • View:    # cat          /sys/module/qla2xxx/parameters/ql2xextended_error_logging

  • qla4xxx:
    • Enable:  # echo $MASK > /sys/module/qla4xxx/parameters/ql4xextended_error_logging
    • Disable: # echo 0     > /sys/module/qla4xxx/parameters/ql4xextended_error_logging
    • View:    # cat          /sys/module/qla4xxx/parameters/ql4xextended_error_logging

  1. Enable at boot time, change the driver options after which you need to recreate the initrd image file on RHEL 5 or the initramfs image file on RHEL 6 and higher.
  • RHEL 4,5-5.5: in /etc/modprobe.conf, update the appropriate qla driver options line, add the line if necessary:
    • qla2xxx: options qla2xxx {extended-logging-parameter-name}=1
    • qla4xxx: options qla4xxx {extended-logging-parameter-name}=1
  • RHEL 5.6-5.*: in /etc/modprobe.conf, update the appropriate qla driver options line, add the line if necessary:
    • qla2xxx: options qla2xxx ql2xextended_error_logging=1
    • qla4xxx: options qla4xxx ql4xextended_error_logging=1
  • RHEL 6.0-6.2: in /etc/modprobe.d/qla*.conf, update the qla options line. If the file doesn't exist, create it.
    • qla2xxx: options qla2xxx ql2xextended_error_logging=1
    • qla4xxx: options qla4xxx ql4xextended_error_logging=1
  • RHEL 6.3+ : in /etc/modprobe.d/qla*.conf, update the qla options line. If the file doesn't exist, create it.
    • qla2xxx: options qla2xxx ql2xextended_error_logging=$MASK
    • qla4xxx: options qla4xxx ql4xextended_error_logging=2
  • RHEL 7, 8, 9 : in /etc/modprobe.d/qla*.conf, update the qla options line. If the file doesn't exist, create it.
    • qla2xxx: options qla2xxx ql2xextended_error_logging=$MASK
    • qla4xxx: options qla4xxx ql4xextended_error_logging=2

 
See knowledge base article How do I rebuild the initial ramdisk image in Red Hat Enterprise Linux for steps to rebuild the initrd or initramfs image file.
 

NOTE:
{path} and {extended-error-logging-parameter} are version specific and have different values for different versions of the driver and different revisions of RHEL. The information below will help identify the specific values for path and the extended error logging parameter names. Perform the following commands to verify the path to the parameter:

# find /sys -name '*extended_error_logging' | grep qla | xargs ls -l # find /sys -name '*extended_error_logging' | grep qla | xargs cat

Some common path values include:

  • '/sys/module/qla2xxx/', or
  • '/sys/module/qla2xxx/parmeters/', or
  • '/sys/module/qla4xxx/parameters/'.

Some common parameter name values include:

  • 'extended_error_logging', or
  • 'ql2xextended_error_logging'.

If unsure of the parameter name or path, you can use one of the following three methods to find them.

1) Use the following command, or equivalent, to locate the parameter(s) within the sysfs tree:
# find /sys -name '*extended_error_logging' | grep qla | xargs ls -l 
-r--r--r-- 1 root root 0 Jul 30 05:07 /sys/module/qla2xxx/parameters/ql2xextended_error_logging

2) Use the following command, or equivalent, to identify the parameter(s) within a git source tree:

# find . -name 'ql*.c' | xargs grep extended_error_logging | grep MODULE_PARM | grep -v vanilla 
 ./kernel/kernel-2.6.18/linux-2.6.18.noarch/drivers/scsi/qla4xxx/ql4_os.c:MODULE_PARM_DESC(extended_error_logging,
 ./kernel/kernel-2.6.18/linux-2.6.18.noarch/drivers/scsi/qla2xxx/qla_os.c:MODULE_PARM_DESC(ql2xextended_error_logging,

3) Use the following command, or equivalent, to identify the parameter(s) within a specific *.ko module:

# find /lib -name 'qla2xxx.ko'  | xargs -I {} /sbin/modinfo {} | grep extended_error_logging
parm:           ql2xextended_error_logging:Option to enable extended error logging, 
              Default is 0 - no logging. 1 - log errors. (int)

Note which path and parameter name(s) are present as you will need them for the steps outlined above. Be sure you use the right name for the driver you want to turn debug flags on for.

Note: The default file permission on the extended error logging are sometimes set to read-only permission by sys admins to prevent inadvertent changes.  If you get "Permission denied" when attempting to enabling extended error logging, check file permissions and enable root write permission if needed.
{as root}  

find /sys -name 'extended' | grep qla | grep xxx | xargs ls -l

-r--r--r-- 1 root root    0 Sep 28 13:03 /sys/module/qla2xxx/parameters/ql2xextended_error_logging

find /sys -name 'extended' | grep qla | grep xxx | xargs -I {} chmod u+w {}

-rw-r--r-- 1 root root    0 Sep 28 13:03 /sys/module/qla2xxx/parameters/ql2xextended_error_logging

How does extended_error_logging work?

Within the source code there are DEBUG macros such as the following:

    574                 DEBUG2(printk("scsi(%ld): Asynchronous PORT UPDATE.\n",
    575                     ha->host_no));
    576                 DEBUG(printk(KERN_INFO
    577                     "scsi(%ld): Port database changed %04x %04x %04x.\n",
    578                     ha->host_no, mb[1], mb[2], mb[3]));

By default the DEBUG() macros are compiled into the driver within RHEL kits, but only these macros.  The other macros -- DEBUG2(), DEBUG3(), etc. -- are not compiled in within the shipped driver.  When you turn on extended error logging, you are setting a flag within the driver to output the DEBUG() messages.  Which conditions and information are placed within DEBUG() messages is determined by the driver maintainers... and this changes over the different revisions of the driver.

     44 #define DEBUG(x)        do { if (extended_error_logging) { x; } } while (0)
     45
     46 #if defined(QL_DEBUG_LEVEL_1)
     47 #define DEBUG1(x)      do {x;} while (0)
     48 #else
     49 #define DEBUG1(x)      do {} while (0)
     50 #endif
     51

Note that when extended_error_logging is set, then the statement within the DEBUG() macro is executed.  In most or all cases the statement to be executed with be a printk() that outputs extended information into the console log file to be used for diagnostic purposes.  Furthermore it can be seen that DEBUG1 and by extension DEBU2/3/4/... macros are only compiled in if the appropriate debug level is defined within the driver.

File: qla_dbg.h

      7 /*
      8  * Driver debug definitions.
      9  */
     10 /* #define QL_DEBUG_LEVEL_1  */ /* Output register accesses to COM1 */
     11 /* #define QL_DEBUG_LEVEL_2  */ /* Output error msgs to COM1 */
     12 /* #define QL_DEBUG_LEVEL_3  */ /* Output function trace msgs to COM1 */
     13 /* #define QL_DEBUG_LEVEL_4  */ /* Output NVRAM trace msgs to COM1 */
     14 /* #define QL_DEBUG_LEVEL_5  */ /* Output ring trace msgs to COM1 */
     15 /* #define QL_DEBUG_LEVEL_6  */ /* Output WATCHDOG timer trace to COM1 */
     16 /* #define QL_DEBUG_LEVEL_7  */ /* Output RISC load trace msgs to COM1 */
     17 /* #define QL_DEBUG_LEVEL_8  */ /* Output ring saturation msgs to COM1 */
     18 /* #define QL_DEBUG_LEVEL_9  */ /* Output IOCTL trace msgs */
     19 /* #define QL_DEBUG_LEVEL_10 */ /* Output IOCTL error msgs */
     20 /* #define QL_DEBUG_LEVEL_11 */ /* Output Mbx Cmd trace msgs */
     21 /* #define QL_DEBUG_LEVEL_12 */ /* Output IP trace msgs */
     22 /* #define QL_DEBUG_LEVEL_13 */ /* Output fdmi function trace msgs */
     23 /* #define QL_DEBUG_LEVEL_14 */ /* Output RSCN trace msgs */
     24 /* #define QL_DEBUG_LEVEL_15 */ /* Output NPIV trace msgs */
     25 /* #define QL_DEBUG_LEVEL_16 */ /* Output ISP84XX trace msgs */

By default these additional debug levels are turned off by default.  For testing purposes if additional information is needed then enable which areas of information are needed by uncommenting the appropriate defines.  Note that turning on extended error logging in such instrumented drivers will provide not only the base DEBUG() information but all the additional information that has been turned on -- it is all or nothing although you certainly could change that when building your test package.

From the Qlogic web site:

"
The QLogic drivers for the 2.4 (RHEL 3) kernel have the parameter "extended_error_logging" that defines
whether to enable (1) or disable (0) writing the debug information to /var/log/messages. This
parameter is passed to the driver from the command line using insmod or from modprobe 'option'
directive found in the modprobe.conf[.local] file. The parameter is in the simple
<keyword>=value format, i.e. extended_error_logging=1

"

Note that the 2.4 driver has some things as DEBUG2 vs DEBUG in later drivers.  In some
of these cases there are printk's too.  As a result there are some messages
such as RSCN notification that cannot be turned off within the logfiles within RHEL 3.

Aug 13 05:40:48 rbrzw2001 kernel: scsi(2): RSCN database changed -0x300,0x0.
Aug 13 05:40:48 rbrzw2001 kernel: scsi(2): Waiting for LIP to complete...
Aug 13 05:40:48 rbrzw2001 kernel: scsi(2): Waiting for LIP to complete...
Aug 13 05:40:48 rbrzw2001 kernel: scsi(2): Topology - (F_Port), Host Loop address 0xffff

See the appropriate source files for more info.

How do I configure additional debug or trace information from the driver beyond what is offered in a standard release?

This is done by rebuilding the driver with debug flag changes: either removing some levels of debug or adding some.

By default all debug macros are turned off during compilation except DEBUG() statements.

File: qla_dbg.h

 /*
* Driver debug definitions.
*/
/* #define QL_DEBUG_LEVEL_1  */ /* Output register accesses to COM1 */
/* #define QL_DEBUG_LEVEL_2  */ /* Output error msgs to COM1 */
/* #define QL_DEBUG_LEVEL_3  */ /* Output function trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_4  */ /* Output NVRAM trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_5  */ /* Output ring trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_6  */ /* Output WATCHDOG timer trace to COM1 */ << no macros
/* #define QL_DEBUG_LEVEL_7  */ /* Output RISC load trace msgs to COM1 */
/* #define QL_DEBUG_LEVEL_8  */ /* Output ring saturation msgs to COM1 */ << no macros
/* #define QL_DEBUG_LEVEL_9  */ /* Output IOCTL trace msgs */
/* #define QL_DEBUG_LEVEL_10 */ /* Output IOCTL error msgs */
/* #define QL_DEBUG_LEVEL_11 */ /* Output Mbx Cmd trace msgs */
/* #define QL_DEBUG_LEVEL_12 */ /* Output IP trace msgs */
/* #define QL_DEBUG_LEVEL_13 */ /* Output fdmi function trace msgs */
/* #define QL_DEBUG_LEVEL_14 */ /* Output RSCN trace msgs */
/*         QL_DEBUG_LEVEL_15                                              << doesn't exist */
/* #define QL_DEBUG_LEVEL_16 */ /* Output ISP84XX trace msgs */ 
:  
.

>> In addition within the same file are trace debug macros

:
.  
/*
* Macros use for debugging the driver.
*/

#undef ENTER_TRACE
#if defined(ENTER_TRACE)
#define ENTER(x)       do { printk("qla2100 : Entering %s()\n", x); } while (0)
#define LEAVE(x)       do { printk("qla2100 : Leaving %s()\n", x);  } while (0)
#define ENTER_INTR(x)   do { printk("qla2100 : Entering %s()\n", x); } while (0)
#define LEAVE_INTR(x)   do { printk("qla2100 : Leaving %s()\n", x);  } while (0)
#else
#define ENTER(x)        do {} while (0)
#define LEAVE(x)        do {} while (0)
#define ENTER_INTR(x)   do {} while (0)
#define LEAVE_INTR(x)   do {} while (0)
#endif

The routine trace macros cannot be turned on/off  but may come in handy in some debug/testing scenarios.

So the first step is to uncomment which DEBUG* levels you are interested in. Too bad you can't turn them all on and then select which ones you want to enable during runtime though.  Then recompile the driver. Once compiled in, ALL compiled in DEBUG* macros are turned on together with the 'extended_error_logging' switch.  So you have to decide before hand which ones will help you/customer out -- often this can be pretty much a guess.  You want to turn on enough to gather the information you need, but not so many such that the output log is flooded with tons of messages you need to filter through and interpret.

Article Type