fence_ipmilan leaves server powered-off despite 'stonith-action' property set to "reboot"

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7 with High Availability Add-On
  • Red Hat Enterprise Linux 7 with Resilient Storage Add-On
  • pacemaker cluster
  • Dell iDRAC 9

Issue

Although the cluster property stonith-action is set to reboot in our cluster, a node is left powered-off after being fenced by fence_ipmilan. The fencing method is set to onoff as opposed to cycle:

man fence_ipmilan:

       -m, --method=[method]
              Method to fence (onoff|cycle) (Default Value: onoff)

Resolution

Update the stonith device in question with the power_wait parameter. The concrete value will depend on your environment; using a value of 5-10 seconds is a good starting point.

# pcs stonith update <your_stonith_id> power_wait=10

Root Cause

As the fencing method is set to onoff, iDRAC will first power the node off, then power it back on while fencing. In some environments, the default value of 2 seconds between these two instructions are not sufficient. Hence the need to increase the time between the poweroff and poweron command issued by iDRAC to the server.

man fence_ipmilan:

       --power-wait=[seconds]
              Wait X seconds after issuing ON/OFF (Default Value: 2)
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.