How can I configure power fencing for the IBM Power platform using an HMC in a RHEL High Availability cluster?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7 (with the High Availability Add-on)
  • Red Hat Enterprise Linux 8 (with the High Availability Add-on)
  • IBM Power8 processor-based systems
  • IBM Power9 processor-based systems
  • IBM Power10 processor-based systems

Issue

  • We need to configure fencing in our cluster that runs on LPARs on IBM Power architecture with an HMC.
  • What is the fence agent for LPARs in IBM Power processor-based systems, and how can we implement it in our cluster?
  • How can I configure fence_lpar for IBM Power?

Resolution

The fence_lpar fence agent, which connects to the Hardware Management Console (HMC), can be used to fence IBM Power processor-based LPARs. The agent needs several parameters in order to connect to the HMC and execute fencing operations. Test the parameters by running the fence agent directly from the command line. Then, once the parameters have been verified to work, use them to creating a stonith device within the Pacemaker cluster.

Note: If you later Content from www.ibm.com is not included.migrate an LPAR to another physical server, be sure to update the configured stonith devices so that they use the new physical system. Note also that live migration of an active cluster member is not currently supported.


Configure a user with minimal permissions on the HMC device for increased security. The user should have only the permissions required for fencing. It should not have access to LPARs and other HMC resources that are not part of the Pacemaker cluster.
  1. Create a new resource role with only the LPARs that will be used as cluster nodes.

  2. Create a new task role based on hmcoperator. Include the following tasks in the role:

    • Under Managed System:
      • View Managed System
    • Under Partition -> View Partitions:
      • View Profile
      • Activate Partition
      • Shutdown Partition
      • Reboot Partition
  3. Create a new user with the resource and task roles created in steps 1 and 2. Ideally, the user will be dedicated to this cluster.


Testing connection to the HMC

For the initial connection testing, assume the following:

  • IP/hostname of the HMC: 10.10.10.10
  • Username for login to HMC: operator
  • Password for login to HMC: password
  • Managed system where node1 resides: ibm-power-system1
  • Managed system where node2 resides: ibm-power-system2
  • LPAR name of node1 known to HMC: lpar1
  • LPAR name of node2 known to HMC: lpar2

Note: The examples below will use these values. They are likely to differ on your systems. Replace them with the appropriate values for your environment.

Running the fence agent with the status action against a specific LPAR should display its power status. Precede the fence_lpar command with the time command so that you will know how long the command takes to complete (including any connection delays). This will be useful later when choosing timeouts for the stonith device.

# time fence_lpar -a  10.10.10.10 -s ibm-power-system1 -p password -l operator -o status -n lpar1
Status: ON

real	0m0.658s
user	0m0.139s
sys	0m0.000s

# time fence_lpar -a  10.10.10.10 -s ibm-power-system2 -p password -l operator -o status -n lpar2
Status: ON

real	0m0.672s
user	0m0.141s
sys	0m0.000s

Testing with verbose output (notice the -v flag) shows some of the steps in connecting to the HMC and querying LPAR power status. fence_lpar connects to the HMC via SSH and then runs the lssyscfg command with appropriate options. (If the output looks somewhat mangled, it's due to carriage return (\r) characters in the output altering the displayed text.)

# time fence_lpar -a  10.10.10.10 -s ibm-power-system1 -p password -l operator -o status -n lpar1 -v
2019-05-31 10:33:55,324 INFO: Running command: /usr/bin/ssh  operator@10.10.10.10 -p 22 -o PubkeyAuthentication=no
2019-05-31 10:33:55,412 DEBUG: Received: operator@10.10.10.10's password:
2019-05-31 10:33:55,412 DEBUG: Sent: password

2019-05-31 10:33:55,529 DEBUG: Received:  
Last login: Fri May 31 14:33:53 2019 from IP
operator@hmc1:~>
2019-05-31 10:33:55,529 DEBUG: Sent: lssyscfg -r lpar -m ibm-power-system1 --filter 'lpar_names=lpar1'

lpar1':33:55,713 DEBUG: Received:  lssyscfg -r lpar -m ibm-power-system1 --filter 'lpar_names=lpar1'
name=lpar1 [... Detailed info on the LPAR...]
operator@hmc1:~>
Status: ON
2019-05-31 10:33:55,713 DEBUG: Sent: quit

real	0m0.657s
user	0m0.133s
sys	0m0.010s

Checking infrastructure requirements


Check that the cluster has met all infrastructure requirements: cabling plan, operation guide reference on how to perform rolling HMC maintenance, etc.

Redundant HMCs


It is recommended to use a fully redundantly cabled dual HMC configuration to allow cluster failover during HMC or network maintenance or outages. This requires fully separate cabling end to end.

Creating the stonith device

When creating the stonith devices, use the Pacemaker node names as shown near the top of the output of pcs config. This may be different from the nodes' hostnames. For more information, see: What format should I use to specify node mappings to stonith devices in pcmk_host_list and pcmk_host_map in a RHEL 6, 7, or 8 High Availability cluster?

The stonith devices can be created as follows:

# pcs stonith create lpar1_fence fence_lpar ipaddr=10.10.10.10 managed=ibm-power-system1 username=operator password=password pcmk_host_map='node1:lpar1' pcmk_delay_base=5s
# pcs stonith create lpar2_fence fence_lpar ipaddr=10.10.10.10 managed=ibm-power-system2 username=operator password=password pcmk_host_map='node2:lpar2'

Note: In two-node clusters, the pcmk_delay_base parameter is useful for preventing fence death scenarios. The above command includes this parameter with the value set to 5s.

View the resulting configuration:

# pcs config
...
Stonith Devices:
 Resource: lpar1_fence (class=stonith type=fence_lpar)
  Attributes: ip=10.10.10.10 managed=ibm-power-system1 login=operator passwd=password pcmk_host_map=node1:lpar1 pcmk_delay_base=5s 
  Operations: monitor interval=60s (lpar1_fence-monitor-interval-60s)
 Resource: lpar2_fence (class=stonith type=fence_lpar)
  Attributes: ip=10.10.10.10 managed=ibm-power-system2 login=operator passwd=password pcmk_host_map=node2:lpar2
  Operations: monitor interval=60s (lpar2_fence-monitor-interval-60s)
...

Verify that the device is running (that is, that Pacemaker's monitor actions for the device are succeeding):

# pcs status
...
 lpar1_fence	(stonith:fence_lpar):	Started node1
 lpar2_fence	(stonith:fence_lpar):	Started node1

Redundant HMCs


If your environment is set up with redundant HMCs (e.g., `hmc1` and `hmc2`) as recommended, then create one stonith device for each HMC and configure the stonith devices into separate [fencing levels](/solutions/891323). In this way, if a fence action fails to connect to `hmc1` (for example, due to a network or hardware failure), Pacemaker will try to use `hmc2` as a backup to fence the unhealthy cluster node.

Example commands are below. Remember to replace the parameters with values appropriate to your environment.

# # Devices for lpar1
# pcs stonith create lpar1_fence_hmc1 fence_lpar ipaddr=10.10.10.10 managed=ibm-power-system1 login=operator passwd=password pcmk_host_map='node1:lpar1' pcmk_delay_base=5s

# pcs stonith create lpar1_fence_hmc2 fence_lpar ipaddr=10.10.0.20 managed=ibm-power-system1 login=operator passwd=password pcmk_host_map='node1:lpar1' pcmk_delay_base=5s

# # Devices for lpar2
# pcs stonith create lpar2_fence_hmc1 fence_lpar ipaddr=10.10.10.10 managed=ibm-power-system2 login=operator passwd=password pcmk_host_map='node2:lpar2'

# pcs stonith create lpar2_fence_hmc2 fence_lpar ipaddr=10.10.0.20 managed=ibm-power-system2 login=operator passwd=password pcmk_host_map='node2:lpar2'

# pcs stonith level add 1 node1 lpar1_fence_hmc1
# pcs stonith level add 2 node1 lpar1_fence_hmc2

# pcs stonith level add 1 node2 lpar2_fence_hmc1
# pcs stonith level add 2 node2 lpar2_fence_hmc2

View the resulting configuration:

# pcs config
...
Stonith Devices:
 Resource: lpar1_fence_hmc1 (class=stonith type=fence_lpar)
  Attributes: ipaddr=10.10.10.10 login=operator managed=ibm-power-system1 passwd=password pcmk_delay_base=5s pcmk_host_map=node1:lpar1
  Operations: monitor interval=60s (lpar1_fence_hmc1-monitor-interval-60s)
 Resource: lpar1_fence_hmc2 (class=stonith type=fence_lpar)
  Attributes: ipaddr=10.10.10.20 login=operator managed=ibm-power-system1 passwd=password pcmk_delay_base=5s pcmk_host_map=node1:lpar1
  Operations: monitor interval=60s (lpar1_fence_hmc2-monitor-interval-60s)
 Resource: lpar2_fence_hmc1 (class=stonith type=fence_lpar)
  Attributes: ipaddr=10.10.10.10 login=operator managed=ibm-power-system2 passwd=password pcmk_host_map=node2:lpar2
  Operations: monitor interval=60s (lpar2_fence_hmc1-monitor-interval-60s)
 Resource: lpar2_fence_hmc2 (class=stonith type=fence_lpar)
  Attributes: ipaddr=10.10.10.20 login=operator managed=ibm-power-system2 passwd=password pcmk_host_map=node2:lpar2
  Operations: monitor interval=60s (lpar2_fence_hmc2-monitor-interval-60s)
Fencing Levels:
  Target: node1
    Level 1 - lpar1_fence_hmc1
    Level 2 - lpar1_fence_hmc2
  Target: node2
    Level 1 - lpar2_fence_hmc1
    Level 2 - lpar2_fence_hmc2
...

Verify that the devices are running (that is, that Pacemaker's monitor actions for the devices are succeeding):

# pcs status
...
 lpar1_fence_hmc1	(stonith:fence_lpar):	Started node1
 lpar1_fence_hmc2	(stonith:fence_lpar):	Started node2
 lpar2_fence_hmc1	(stonith:fence_lpar):	Started node1
 lpar2_fence_hmc2	(stonith:fence_lpar):	Started node2

Additional references

Red Hat content:

External content:

Root Cause

The Hardware Management Console (HMC) is used to manage IBM Power processor-based systems. A fence_lpar stonith device connects to an HMC and issues the commands necessary to fence an unhealthy cluster node.

For more information about fencing, see: Fencing in a Red Hat High Availability Cluster.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.