Introduction to Failover in a Pacemaker Cluster (RHEL 8 and later)
This article provides an introduction to creating a Pacemaker cluster running a service that will fail over from one node to another when the node on which the service is running becomes unavailable. By working through this procedure, you can learn how to create a service in a two-node cluster and you can then observe what happens to that service when it fails on the node on which it running.
This example procedure configures a two node Pacemaker cluster running an Apache HTTP server. You can then stop the Apache service on one node to see how the service remains available.
For a simple introduction to the basic Pacemaker commands, see Introduction to Pacemaker Cluster Tools (RHEL 8).
Note: These procedures do not create a supported Red Hat cluster, which requires at least two nodes and the configuration of a fencing device. For full information about Red Hat’s support policies, requirements, and limitations for RHEL High Availability clusters, see This content is not included.Support Policies for RHEL High Availability Clusters.
In this example:
- The nodes are is
z1.example.comandz2.example.com. - The floating IP address is 192.168.122.120.
Prerequisites
- Two nodes running RHEL 8 or later
- A floating IP address that resides on the same network as one of the node’s statically assigned IP addresses
- The name of the node on which you are running is in your
/etc/hostsfile
Procedure
-
On both nodes, install the Red Hat High Availability Add-On software packages from the High Availability channel, and start and enable
thepcsdservice.# dnf install pcs pacemaker fence-agents-all ... # systemctl start pcsd.service # systemctl enable pcsdIf you are running the
firewallddaemon, on both nodes enable the ports that are required by the Red Hat High Availability Add-On.# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --reload -
On both nodes in the cluster, set a password for user
hacluster.# passwd hacluster -
On both nodes in the cluster, authenticate user
haclusterfor each node in the cluster.# pcs host auth z1.example.com z2.example.com -
Create a cluster named
my_clusterwith both nodes as cluster members, This command creates and starts the cluster in one step. You need to run this from only one node in the cluster becausepcsconfiguration commands take effect for the entire cluster.On one node in cluster, run the following command.
# pcs cluster setup my_cluster --start z1.example.com z2.example.com -
A Red Hat High Availability cluster requires that you configure fencing for the cluster. The reasons for this requirement are described in the Red Hat Knowledgebase solution Fencing in a Red Hat High Availability Cluster. For this introduction, however, which is intended to show only how to use the basic Pacemaker commands, disable fencing by setting the
stonith-enabledcluster option to false.Warning: The use of
stonith-enabled=falseis completely inappropriate for a production cluster. It tells the cluster to simply pretend that failed nodes are safely fenced.On one node in the cluster, run the following command:
# pcs property set stonith-enabled=false -
After creating a cluster and disabling fencing, check the status of the cluster. Note: When you run the
pcs statuscommand, it may show output that temporarily differs slightly from the examples as the system components start up.# pcs cluster status Cluster Status: Stack: corosync Current DC: z1.example.com (version 2.0.0-10.el8-b67d8d0de9) - partition with quorum Last updated: Thu Oct 11 16:11:18 2018 Last change: Thu Oct 11 16:11:00 2018 by hacluster via crmd on z1.example.com 2 nodes configured 0 resources configuredPCSD Status:
z1.example.com: Online
z2.example.com: Online -
On both nodes, configure a web browser and create a web page to display a simple text message. If you are running the
firewallddaemon, enable the ports that are required byhttpd.Note: Do not use
systemctl enableto enable any services that will be managed by the cluster to start at system boot.# dnf install -y httpd wget ... # firewall-cmd --permanent --add-service=http # firewall-cmd --reload # cat <<-END >/var/www/html/index.html <html> <body>My Test Site - $(hostname)</body> </html> ENDIn order for the Apache resource agent to get the status of Apache, on each node in the cluster create the following addition to the existing configuration to enable the status server URL.
# cat <<-END > /etc/httpd/conf.d/status.conf <Location /server-status> SetHandler server-status Order deny,allow Deny from all Allow from 127.0.0.1 Allow from ::1 </Location> END -
Create
IPaddr2andapacheresources for the cluster to manage. TheIPaddr2resource is a floating IP address that must not be one already associated with a physical node. If theIPaddr2resource's NIC device is not specified, the floating IP must reside on the same network as the statically assigned IP address used by the node.You can display a list of all available resource types with the
pcs resource listcommand. You can use thepcs resource describeresourcetype command to display the parameters you can set for the specified resource type. For example, the following command displays the parameters you can set for a resource of typeapache:# pcs resource describe apache ...In this example, the IP address resource and the apache resource are both configured as part of a group named
apachegroup, which ensures that the resources are kept together to run on the same node.Run the following commands from one node in the cluster:
# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.122.120 cidr_netmask=32 op monitor interval=30s --group apachegroup # pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=1min --group apachegroup # pcs status Cluster name: my_cluster Last updated: Fri Feb 5 19:02:30 2016 Last change: Fri Feb 5 19:02:06 2016 by hacluster via crmd on z1.example.com Stack: corosync Current DC: z1.example.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Online: [ z1.example.com z2.example.com ] Full list of resources: Resource Group: apachegroup ClusterIP (ocf::heartbeat:IPaddr2): Started z1.example.com WebSite (ocf::heartbeat:apache): Started z1.example.com PCSD Status: z1.example.com: Online z2.example.com: Online
...
Note that in this instance, the `apachegroup` service is running on node z1.example.com.
-
Access the website you created, stop the service on the node on which it is running, and note how the service fails over to the second node.
a) Point a browser to the website you created using the floating IP address you configured. This should display the text message you defined, displaying the name of the node on which the website is running.
b) Stop the apache web service. Using
killall -9simulates an application-level crash.# killall -9 httpd Check the cluster status. You should see that stopping the web service caused a failed action, but that the cluster software restarted the service on the node on which it had been running and you should still be able to access the web browser. # pcs status Cluster name: my_cluster Stack: corosync Current DC: z1.example.com (version 2.0.0-10.el8-b67d8d0de9) - partition with quorum Last updated: Fri Oct 12 09:54:33 2018 Last change: Fri Oct 12 09:54:30 2018 by root via cibadmin on z1.example.com 2 nodes configured 2 resources configured Online: [ z1.example.com z2.example.com ] Full list of resources: Resource Group: apachegroup ClusterIP (ocf::heartbeat:IPaddr2): Started z1.example.com WebSite (ocf::heartbeat:apache): Started z1.example.com Failed Resource Actions: * WebSite_monitor_60000 on z1.example.com 'not running' (7): call=31, status=complete, exitreason='none',last-rc-change='Fri Feb 5 21:01:41 2016', queued=0ms, exec=0ms
Clear the failure status once the service is up and running again.
# pcs resource cleanup WebSitec) Put the node on which the service is running into standby mode. Note that since we have disabled fencing we can not effectively simulate a node-level failure (such as pulling a power cable) because fencing is required for the cluster to recover from such situations.
# pcs node standby z1.example.comd) Check the status of the cluster and note where the service is now running.
# pcs status Cluster name: my_cluster Stack: corosync Current DC: z1.example.com (version 2.0.0-10.el8-b67d8d0de9) - partition with quorum Last updated: Fri Oct 12 09:54:33 2018 Last change: Fri Oct 12 09:54:30 2018 by root via cibadmin on z1.example.com 2 nodes configured 2 resources configured Node z1.example.com: standby Online: [ z2.example.com ] Full list of resources: Resource Group: apachegroup ClusterIP (ocf::heartbeat:IPaddr2): Started z2.example.com WebSite (ocf::heartbeat:apache): Started z2.example.come) Access the website. There should be no loss of service, although the display message should indicate the node on which the service is now running.
-
To restore cluster services to the first node, take the node out of standby mode. This will not necessarily move the service back to that node.
# pcs node unstandby z1.example.com -
For final cleanup, stop the cluster services on both nodes.
# pcs cluster stop --all