Stonith device fence_cisco_ucs fails while communicating with UCS Blade

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
  • pacemaker
  • Cisco UCS Blade

Issue

  • Stonith device fence_cisco_ucsagent fails during testing.
  • pcs status shows "unknown error"
  • The command pcs stonith fence <nodename> shows Command failed: No route to host. When fence_cisco_ucs is tested manually the actions status and list work successfully.

Resolution

Ensure the below procedures were performed on the UCS manager

  • Login to UCS manager as a user that has Admin privileges.
  • Click on the admin tab and select user management from the drop down menu.
  • Expand user service option from the left column and click on local authenticated users.
  • Select the user you have created for ucs fencing.
  • In the general tab for the user make sure user has below roles assigned
    • Admin
    • Server-equipment

Then manual test that you can perform the following actions with the fence_cisco_ucs fencing agent: status, list, on, off, reboot.
After making the recommended changes in the "Resolution" section on the UCS Blade, verify changes work by fencing an opposite node using two different methods:

# fence_cisco_ucs --ip="X.X.X.X" --username="<username>" --passwd="<password>" -z 1 --plug="UCSPROFILE2" --suborg="/org-RHEL/" -o reboot -vvv

If those are successfully then verify that the fencing agent is properly configured in pacemaker and manual call stonith on a cluster node to see if the cluster node is successfully fenced off.

# pcs stonith fence <nodename>

Root Cause

The user that was configured to do the fencing on UCS fence device did not have the correct privileges set.

Diagnostic Steps

The command pcs status displays the below errors in "Failed Actions":

Failed Actions:
* fence_ucs_start_0 on <nodename> 'unknown error' (1): call=38, status=Error, exitreason='none',
    last-rc-change='Wed Dec  7 15:34:00 2016', queued=0ms, exec=1098ms
* fence_ucs_start_0 on <nodename> 'unknown error' (1): call=102, status=Error, exitreason='none',
    last-rc-change='Wed Dec  7 15:33:58 2016', queued=0ms, exec=1093ms

After trying to fence a node, does it print the following error: "No route to host"

# pcs stonith fence <nodename>
Error: unable to fence <nodename>
Command failed: No route to host

Manually test the fencing agent fence_cisco_ucs to verify the following actions work successfully: status, list, on, off, reboot. If they do then likely a configuration issue within pacemaker. If they do not then likely a configuration issue on the fence device or one of the parameters.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.