Fence_drac and fence_drac5 will not fence my cluster node with an iDRAC6.
Environment
- Red Hat Enterprise Linux 5.4+ Advanced Platform (Clustering) or High Availability Add-on
- Red Hat Enterprise Linux Server 6 with the High Availability Add on
- iDrac6 fw: 1.03.10, all firmware version of iDRAC are probably affected by this issue.
- This includes Dell PowerEdge R710 & R910 blade servers
Issue
- Getting error "unable to login" using fence_drac.
- Currently fencing with fencing_drac5 it will send init0 to the system.. not powercycling
- When trying to fence my node with iDRAC6 fencing, we repeatedly see fencing fail:
fenced[6326]: fencing node "rhel5clu3.example.net" fenced[6326]: fence "rhel5clu3.example.net" failed fenced[6326]: fencing node "rhel5clu3.example.net" fenced[6326]: fence "rhel5clu3.example.net" failed
Resolution
IPMI fencing solution:
The supported fencing method an iDRAC6 or newerdevice is ipmilan fencing (supported as of RHEL5 - cman-2.0.115-34.el5, although it may work on RHEL5.4) via the fence_ipmilan fencing agent. Ensure that "IPMI over LAN" is enabled on the iDRAC6 itself prior to testing fencing. According to the current iDRAC user's guide, this setting is disabled by default.
Please see the Dell configuration guide for more information on this setting: Content from support.dell.com is not included.Content from support.dell.com is not included.http://support.dell.com/support/edocs/software/smdrac3/idrac/idrac22modular/en/ug/pdf/ug.pdf (page 91) or Content from supportapj.dell.com is not included.Configuring the iDRAC6 via the Web Interface.
You can test ipmilan fencing using the commandline tool fence_ipmilan (see man fence_ipmilan for more information about possible options): How can I diagnose fence_ipmilan failures in RHEL 5, 6, or 7?
Unsupported solution with fence_drac5 solution
An unsupported alternate method is to use fence_drac5 and set the cmd_prompt option in cluster.conf the reflect the command prompt of your iDRAC6. If ssh is required then add the secure option. Please note that the command prompt is usually /admin1-> on idrac/drac card versions 6,7,8 and that is reason that the cmd_prompt options is needed.
This method will not be supported and is not recommended:
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.101" login="root" name="node1-drac"
passwd="drac_password"/>
<fencedevice agent="fence_drac5" cmd_prompt="admin1->" ipaddr="192.168.0.102" login="root" name="node2-drac"
passwd="drac_password" secure="1"/>
</fencedevices>
For more information about the supported fence devices then see the following article: This content is not included.Fence Device and Agent Information for Red Hat Enterprise Linux
Root Cause
- There are two varieties of iDrac: express and enterprise. "iDRAC express" is only an option on rackmount and tower servers.
- It is merely a "BMC" (baseboard management controller) and does not have the ability to power servers on or off.
- iDRAC6 uses a different prompt than previous version of the drac firmware.
- More information can be found here: http://linux.dell.com/wiki/index.php/Products/HA/DellRedHatHALinuxCluster/Cluster#Configure_iDRAC6_Fencing
Diagnostic Steps
Troubleshooting steps for fence_ipmilan:
-
Ensure that the ipmilan port is activated on the iDRAC6 device. The configuration may be different depending on the model, but generally it is enabled on the networking configuration screen on the iDRAC6 web interface. See How can I diagnose
fence_ipmilanfailures in RHEL 5, 6, or 7? -
Determine the ip address, username and password for the iDRAC6, then test whether you can read from the device using fence_idrac (this command will return the status and NOT reboot the host):
# fence_ipmilan -a 10.10.10.10 -l admin -p redhat -o status -vvvv -
Once you can read the status from the fence device, you can optionally test fencing the device by changing the operation from "status" to "reboot" (CAUTION: THIS WILL REBOOT YOUR HOST):
# fence_ipmilan -a 10.10.10.10 -l admin -p redhat -o reboot -vvvv -
Once you have confirmed you can connect to the fence device, then you should configure the fence device for the cluster. You can use
conga/lucito do this, or manually modify/etc/cluster/cluster.confand propagate the changes. -
Test your configuration by using
fence_nodewith the full name of your cluster node from/etc/cluster/cluster.confto get status.# fence_node -S rhel5clu3.example.net # echo $? 0
Troubleshooting steps for fence_drac5:
Configure DRAC fencing and test it with either of the following commands:
$ fence_drac5 -a 192.168.0.101 -l root -p drac_password -c 'admin1->'
$ fence_node node1-drac
If the DRAC backend requires a secure ssh connection, add the -x switch:
$ fence_drac5 -a 192.168.0.101 -l root -p drac_password -c 'admin1->' -x
$ fence_node node1-drac
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.