How do I disable MCE function?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux

Issue

  • How do I disable MCE function?
  • Why /var/log/mcelog file still has content even if I turn off mcelogd service?

Resolution

  • Add following item in /boot/grub/grub.cfg
mce=off
  • From kernel-doc package: Documentation/x86/x86_64/boot-options.txt
Machine check

   Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.

   mce=off
		Disable machine check
   mce=no_cmci
		Disable CMCI(Corrected Machine Check Interrupt) that
		Intel processor supports.  Usually this disablement is
		not recommended, but it might be handy if your hardware
		is misbehaving.
		Note that you'll get more problems without CMCI than with
		due to the shared banks, i.e. you might get duplicated
		error logs.
   mce=dont_log_ce
		Don't make logs for corrected errors.  All events reported
		as corrected are silently cleared by OS.
		This option will be useful if you have no interest in any
		of corrected errors.
   mce=ignore_ce
		Disable features for corrected errors, e.g. polling timer
		and CMCI.  All events reported as corrected are not cleared
		by OS and remained in its error banks.
		Usually this disablement is not recommended, however if
		there is an agent checking/clearing corrected errors
		(e.g. BIOS or hardware monitoring applications), conflicting
		with OS's error handling, and you cannot deactivate the agent,
		then this option will be a help.
   mce=bootlog
		Enable logging of machine checks left over from booting.
		Disabled by default on AMD because some BIOS leave bogus ones.
		If your BIOS doesn't do that it's a good idea to enable though
		to make sure you log even machine check events that result
		in a reboot. On Intel systems it is enabled by default.
   mce=nobootlog
		Disable boot machine check logging.
   mce=tolerancelevel[,monarchtimeout] (number,number)
		tolerance levels:
		0: always panic on uncorrected errors, log corrected errors
		1: panic or SIGBUS on uncorrected errors, log corrected errors
		2: SIGBUS or log uncorrected errors, log corrected errors
		3: never panic or SIGBUS, log all errors (for testing only)
		Default is 1
		Can be also set using sysfs which is preferable.
		monarchtimeout:
		Sets the time in us to wait for other CPUs on machine checks. 0
		to disable.

   nomce (for compatibility with i386): same as mce=off

   Everything else is in sysfs now.

Root Cause

  • The cron job will regenerate mcelog in /var/log/mcelog in RHEL6, so just stop the mcelogd service is not enough.
    After adding mce=off in the kernel parameter line, /usr/sbin/mcelog command cannot be run any more.
$ cat /etc/cron.hourly/mcelog.cron 
#!/bin/bash
/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog   
  • systemd service will store the mcelog in RHEL7/8.
$ cat /etc/cron.hourly/mcelog.cron 
#!/bin/bash
# Disabled by default on Fedora since this is run as daemon
# using the mcelog.service systemd configuration entries.
#/usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog

# systemctl status mcelog
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-05 18:52:38 EST; 2 weeks 6 days ago
 Main PID: 1174 (mcelog)
    Tasks: 1
   CGroup: /system.slice/mcelog.service
           └─1174 /usr/sbin/mcelog --ignorenodev --daemon --syslog

Nov 05 18:52:38 testjay systemd[1]: Starting Machine Check Exception Logging Daemon...
Nov 05 18:52:38 testjay systemd[1]: Started Machine Check Exception Logging Daemon.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.