"kernel: EDAC k8 MC1: extended error code: ECC chipkill x4 error" in messages log

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Server 6
  • Red Hat Enterprise Linux Server 5
  • Red Hat Enterprise Linux 4.7 and later

Issue

  • "ECC chipkill x4 error" logged by EDAC in /var/log/messages, similar as below:
kernel: EDAC k8 MC1: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic) 
kernel: EDAC MC1: CE page 0xa16f8, offset 0x80, grain 8, syndrome 0xf858, row 0, channel 1, label "": k8_edac 
kernel: EDAC k8 MC1: extended error code: ECC chipkill x4 error

Resolution

  • To detect which memory module may be faulty, install edac-utils package. Check the outputs of:
# edac-util -vvv

or

# edac-util --report=full
  • The faulty module can be identified by outputs similar to:
mc1: csrow1: ch0: 2 Corrected Errors

or

mc1:csrow1:ch0:CE:2
  • If in any case the output intermittently detects errors sometimes and no errors at other times, it may be better to check the memory modules by physically swapping them.
  • If memory modules are found okay, could be a broken EDAC implementation. If a BIOS update does not fix the issue, the EDAC modules can be blacklisted from loading by the following steps:
# echo k8_edac >> /etc/modprobe.d/blacklist
# echo edac_mc >> /etc/modprobe.d/blacklist
# modprobe -r k8_edac edac_mc

Root Cause

  • It is possibly a memory module problem. EDAC is Error Detection and Correction, it will try to detect and correct hardware problems. In this case it appears that chipkill is detecting the problem and correcting it. Any significant hardware problems may not be experienced in the short term; however, it is recommend to have the DIMMs checked and replace the faulty one.

Diagnostic Steps

  • Hardware and/or memory diagnostic software should be run with ECC disabled.

  • The MC, row and channel information in the above snip helps in pointing out the defective memory module. But the edac-utils package has tools which provide the information in understandable format.

  • edac-util -vvv may not show correctible errors all the times. It should preferable be run when the error messages are getting logged in the /var/log/messages.

  • For more information see:

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.