How can I use ping resource in pacemaker?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6, 7, 8 and 9 with High-Availability or Resilient Storage Add-on
  • Pacemaker cluster
  • ping resource

Issue

  • How can I use ping resource in pacemaker?
  • Is there a replacement for ping heuristics in the pacemaker cluster?

Resolution

Configuring a resource:

You can use the pingd attribute in pacemaker cluster to get similar behavior as with heuristics from rgmanager cluster.
The ping cluster resource is used for updating a pingd attribute, and is used with a location constraint. For details, please refer to documentation:


Allowing fail-over:

By default the ping resource will only experience a failure for the monitor operation if the pidfile used to track the ping command no longer exist or there is an issue updating the attribute for the resource within pacemaker. If any other issues are observed for the ping resource during monitor these are reported to "Attributes" section of pcs status and errors are reported to logs, but the resource will otherwise continue to run as seen in the example provided below:

$ pcs resource create testping ping host_list=nodec.local.com <--- valid hostname
$ pcs resource create testping2 ping host_list=not.a.host     <--- invalid hostname
$ pcs resource create testping3 ping host_list=10.0.0.5       <--- down ip
$ pcs status --full

Full list of resources:
-------------------------------------->8--------------------------------------------
 testping	(ocf::pacemaker:ping):	Started nodeb.local.com
 testping2	(ocf::pacemaker:ping):	Started nodec.local.com
 testping3	(ocf::pacemaker:ping):	Started noded.local.com

Node Attributes:
* Node nodeb.local.com (1):
    + pingd                           	: 1         
* Node nodec.local.com (2):
    + pingd                           	: 0         	: Connectivity is lost
* Node noded.local.com (3):
    + pingd                           	: 0         	: Connectivity is lost
$ grep testping2 /var/log/messages
-------------------------------------->8--------------------------------------------
Jun 13 13:14:20 nodec pengine[1387]:  notice:  * Start      testping2      (                    nodec.local.com )
Jun 13 13:14:20 nodec crmd[1388]:  notice: Initiating start operation testping2_start_0 locally on nodec.local.com
Jun 13 13:14:20 nodec ping(testping2)[14740]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known
Jun 13 13:14:20 nodec crmd[1388]:  notice: Result of start operation for testping2 on nodec.local.com: 0 (ok)
--------------------------------------[ Probe removed ]-------------------------------------------- 
Jun 13 13:14:30 nodec ping(testping2)[14947]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known <--- continued monitor's despite error
Jun 13 13:14:40 nodec ping(testping2)[14999]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known
Jun 13 13:14:50 nodec ping(testping2)[15040]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known
Jun 13 13:15:01 nodec ping(testping2)[15093]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known
Jun 13 13:15:11 nodec ping(testping2)[15144]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known
Jun 13 13:15:21 nodec ping(testping2)[15183]: ERROR: Unexpected result for 'ping -n -q -W 18 -c 3  not.a.host' 2: ping: not.a.host: Name or service not known

If you wish the ping resource to trigger a failover on missed pings or other errors, then the failure_score option can be configured. This will trigger a failure after a certain "score" threshold is crossed:

# pcs resource describe ping
-------------------------------------->8--------------------------------------------
Resource options:
-------------------------------------->8--------------------------------------------
  failure_score: Resource is failed if the score is less than failure_score. Default never fails. <----

Configuring Additional Logging:

With the pacemaker errata RHEA-2021:4267 verbose logging can be enabled for the ocf:pacemaker:ping resource by setting the debug parameter to 2 to get a highly detailed log messages. Setting the debug parameter to 2 will get individual ping status sent to the logs on pacemaker-2.1.0-8.el8 or later.

For example:

# pcs resource update ping1 debug=1 host_list=10.1.1.100
# tail /var/log/messages
Aug 11 16:29:48 virt-520 ping(ping1)[1313630]: WARNING: 10.37.166.190 is inactive: PING 10.1.1.100(10.1.1.100) 56(84) bytes of data.#012#012--- 10.1.1.100 ping statistics ---#0123 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2066ms#012pipe 3
Aug 11 16:29:48 virt-520 pacemaker-controld[1310410]: notice: Result of monitor operation for ping1 on virt-520: ok
Aug 11 16:30:01 virt-520 ping(ping1)[1313662]: WARNING: 10.1.1.100 is inactive: PING 10.1.1.100 (10.1.1.100) 56(84) bytes of data.#012#012--- 10.1.1.100 ping statistics ---#0123 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2038ms#012pipe 3

Root Cause

# pcs resource describe ocf:pacemaker:ping
ocf:pacemaker:ping - node connectivity

Every time the monitor action is run, this resource agent records (in the CIB) the current number of ping nodes the host can connect to.
It is essentially the same as pingd except that it uses the system ping tool to obtain the results.
[...]

Diagnostic Steps

Once the ping resource is set up, you can simulate the behaviour by blocking outgoing pings from the concerned machine. However, if you are using physical connectivity, pulling the network cable from the switch or changing the VLAN tag (if used) will be a more representative means of testing the behaviour.

# firewall-cmd --direct --add-rule ipv4 filter OUTPUT 0 -p icmp --icmp-type echo-request -j DROP

To remove the rule blocking outgoing pings, run:

$ firewall-cmd --direct --remove-rule ipv4 filter OUTPUT 0 -p icmp --icmp-type echo-request -j DROP
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.