Automating Cost-Optimized SAP HANA Scale-Up System Replication using the RHEL HA Add-On

Updated

Contents

1. Overview


This article describes how to configure the Automated Cost-Optimized HANA System Replication solution with Scale-Up in a Pacemaker cluster on [supported RHEL releases](https://access.redhat.com/articles/3397471).

A Cost-Optimized configuration refers to a setup in which a Development/Quality Assurance (DEV/QA) test system runs on the standby/secondary side during normal operation. In the event of an error which requires the production side to failover, this DEV/QA Instance must be stopped automatically before the second side takes over production.

This article does NOT cover the preparation of a RHEL system for SAP HANA installation or the SAP HANA installation procedure. For more details on these topics see Content from service.sap.com is not included.SAP Note 2009879 - SAP HANA Guidelines for RedHat Enterprise Linux (RHEL). The intention of this document is to describe the RHEL HA for SAP solution, with the goal of automating the management of a Content from help.sap.com is not included.SAP HANA Scale-Up system Replication environment using the RHEL HA Pacemaker cluster stack provided by Red Hat, while focusing on a "cost-optimized" scenario, specifically where a separate DEV/QA HANA instance is running on the secondary node. For information about performance optimized scenario, refer to this link. For more information on different supported HA scenarios on RHEL, refer to this This content is not included.link.

The development of HA solutions to create automated SAP HANA System Replication environments falls within scope of the partner offerings. There is currently no official SAP HANA certification process. Therefore, SAP currently does not provide any requirement documents or guidelines on how an automated SAP HANA System Replication environment should look like. The only documentation available from SAP is the how-to guide which provides the instructions for setting up the SAP HANA System replication: Content from help.sap.com is not included.How to Perform System Replication for SAP HANA

For information on how to set up the Red Hat HA deployment for managing SAP HANA Scale-Out System Replication environments in general, see This content is not included.Red Hat Enterprise Linux HA Solution for SAP HANA Scale-Out System Replication. Note that in this example we will only be focusing only on a Cost-Optimized setup on a 2 node SAP HANA Scale-Up environment with System Replication.

1.1. Important requirements and considerations


In your project, the following requirements should be met:
  • Define STONITH before adding other resources to the cluster and ensure it is tested. Refer to this link for more information.
  • Tune the operation timeouts of SAPHana and SAPHanaTopology.
  • Set up a test cluster for testing configuration changes and administrative procedure before applying them on the production cluster.
  • Start with the parameter values PREFER_SITE_TAKEOVER=”false”, AUTOMATED_REGISTER=”false” and DUPLICATE_PRIMARY_TIMEOUT=”7200”. This prerequisite is especially important if you are running tests.

In your project, the following requirements should be avoided:

  • rapidly changing/changing back a cluster configuration, such as setting nodes to standby and online again or stopping/starting the multi-state resource.
  • creating a cluster without proper time synchronization or unstable name resolutions for hosts, users and groups.
  • adding location rules for the clone, multi-state or IP resource. Only location rules mentioned in this setup guide are allowed. For public clouds, refer to the cloud specific documentation.
  • using SAP tools for attempting start/stop/takeover actions on a database while the cluster is in charge of managing that database.

1.2. Solution Architecture Overview


![Solution Architecture Overview](https://access.redhat.com/sites/default/files/images/cost-optimised.png)

For more details on various different architectures, see This content is not included.Supported HA Scenarios for SAP HANA, SAP S/4HANA, and SAP NetWeaver for more information.

1.3. Scope


This solution is supported with RHEL 8.2 and higher releases on which SAP HANA is supported.
SAP HANA 2.0 and all SPSs compatible with RHEL 8.2 are supported.
For more details see [Support Policies for RHEL High Availability Clusters - Management of SAP HANA in a Cluster](https://access.redhat.com/articles/3397471).

1.4. Subscription and Repositories


The following repos are required for RHEL 8.x:
  • RHEL BaseOS: provides the RHEL kernel packages
  • RHEL AppStream: provides all the applications needed to run in a given user space
  • RHEL High Availability: provides the Pacemaker framework
  • RHEL for SAP Solutions: provides the resource agents for the automation of HANA System Replication in Scale-Up
    ###1.4.1. On-Premise or Bring Your Own Subscription through Cloud Access {#141onprem-or-byos}
    For on-premise or Bring Your Own Subscription (BYOS) through Red Hat Cloud Access, the subscription to use is This content is not included.RHEL for SAP Solutions.
    RHEL 8.x x86_64: below is the example of repos enabled with RHEL for SAP Solutions 8.2, on-premise or through Cloud Access:
# yum repolist
repo id                                                  repo name                                    status
rhel-8-for-x86_64-appstream-rpms        Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)       8,603
rhel-8-for-x86_64-baseos-rpms           Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)          3,690
rhel-8-for-x86_64-highavailability-rpms Red Hat Enterprise Linux 8 for x86_64 - High Availability (RPM   156
rhel-8-for-x86_64-sap-solutions-rpms    Red Hat Enterprise Linux 8 for x86_64 - SAP Solutions (RPMs)      10

RHEL 8.x power9: below is the example of repos enabled with RHEL for SAP Solutions 8.2 on power9:

# yum repolist
repo id                                       repo name                                                                                                         status
rhel-8-for-ppc64le-appstream-e4s-rpms         Red Hat Enterprise Linux 8 for Power, little endian - AppStream - Update Services for SAP Solutions (RPMs)         4,949
rhel-8-for-ppc64le-baseos-e4s-rpms            Red Hat Enterprise Linux 8 for Power, little endian - BaseOS - Update Services for SAP Solutions (RPMs)            1,766
rhel-8-for-ppc64le-highavailability-e4s-rpms  Red Hat Enterprise Linux 8 for Power, little endian - High Availability - Update Services for SAP Solutions (RPMs)    71
rhel-8-for-ppc64le-sap-solutions-e4s-rpms     Red Hat Enterprise Linux 8 for Power, little endian - SAP Solutions - Update Services for SAP Solutions (RPMs)         4

1.4.2. On-Demand on Public Clouds through RHUI


For deployment in on-demand images on a public cloud, the software packages are delivered in Red Hat Enterprise Linux for SAP with High Availability and Update Services, a variant of RHEL for SAP Solutions, customized for public clouds, available through [RHUI](https://access.redhat.com/products/red-hat-update-infrastructure).

2. SAP HANA System Replication


2.1 Requirements


Refer to this [link](https://access.redhat.com/articles/3397471#additional-requirements) for information on the technical requirements for Cost-Optimized SAP HANA environments.

2.2. Setup Process


The following example shows how to set up system replication between 2 nodes running SAP HANA.
Configuration used in the example:
SID:                   RH2
Instance Number:       02
node1 FQDN:            node1.example.com
node2 FQDN:            node2.example.com
node1 HANA site name:  DC1
node2 HANA site name:  DC2
SAP HANA 'SYSTEM' user password: <HANA_SYSTEM_PASSWORD>
SAP HANA administrative user:    rh2adm

Ensure that both systems can resolve each other's FQDN without any issues. To ensure that FQDNs can be resolved even without DNS, you can place them into /etc/hosts like in the example below.

# /etc/hosts
192.168.0.11 node1.example.com node1
192.168.0.12 node2.example.com node2

For the system replication to work, the SAP HANA log_mode variable must be set to normal. This can be verified as the HANA system user using the command below on both nodes.

[rh2adm]# hdbsql -u system -p <HANA_SYSTEM_PASSWORD> -i 02 "select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
VALUE "normal"
1 row selected

Note that later configuration of primary and secondary nodes is used only during setup. The roles (primary/secondary) may change during cluster operation based on cluster configuration.
A lot of the configuration steps are performed from the SAP HANA administrative user on the system whose name was selected during installation. In examples we will use rh2adm as we use SID RH2. To become the SAP HANA administrative user you can use the command below.

[root]# sudo -i -u rh2adm
[rh2adm]#

2.3. Configure HANA System Replication on the primary node


SAP HANA system replication will only work after the initial backup has been performed. The following command will create an initial backup in /tmp/foo directory. Please note that the size of the backup depends on the database size and may take some time to complete. The directory to which the backup will be placed must be writable by the SAP HANA administrative user.
[root]# chown rh2adm /tmp/foo

Note: Please note that the instructions mentioned here for setting up the HANA System Replication are based on the Content from help.sap.com is not included.official guidelines from SAP. You can refer to the same for more information.
a) On single container systems, the following command can be used for backup:

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

b) On multiple container systems (MDC) SYSTEMDB and all tenant databases needs to be backed up:
See the example below for the backup of SYSTEMDB.

[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA USING FILE ('/tmp/foo')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)
[rh2adm]# hdbsql -i 02 -u system -p <HANA_SYSTEM_PASSWORD> -d SYSTEMDB "BACKUP DATA FOR RH2 USING FILE ('/tmp/foo-RH2')"
0 rows affected (overall time xx.xxx sec; server time xx.xxx sec)

After the initial backup, initialize the replication using the command below:

[rh2adm]# hdbnsutil -sr_enable --name=DC1
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.

Verify that initialization is showing the current node as 'primary' and that SAP HANA is running successfully on it.

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: DC1
Host Mappings:

2.4. Configure HANA System Replication on the secondary node


The secondary node needs to be registered to the primary node.
Copy the SAP HANA system PKI SSFS_RH2.KEY and SSFS_RH2.DAT files from the primary node to the secondary node.
[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY /usr/sap/RH2/SYS/global/security/rsecssfs/key/SSFS_RH2.KEY
[rh2adm]# scp root@node1:/usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT /usr/sap/RH2/SYS/global/security/rsecssfs/data/SSFS_RH2.DAT

To register the secondary node, use the command below:

[rh2adm]# hdbnsutil -sr_register --remoteHost=node1 --remoteInstance=02 --replicationMode=<Rep-Mode> --name=DC2 --online
adding site ...
checking for inactive nameserver ...
nameserver node2:30201 not responding.
collecting information ...
updating local ini files ...
Done.

Where, Content from help.sap.com is not included.Rep-Mode is as per your requirement.
Verify that the secondary node is running and that the 'mode' is correct. The output should look similar to the one below:

[rh2adm]# hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 2
site name: DC2
active primary site: 1

Host Mappings:
~~~~~~~~~~~~~~
node2 -> [DC1] node1
node2 -> [DC2] node2

2.5. Test SAP HANA System Replication


To manually test the SAP HANA System Replication setup, you can follow the procedure described in SAP HANA 2.0: chapter "9. Testing" - [How to Perform System Replication for SAP HANA 2.0 guide](https://assets.cdn.sap.com/sapcom/docs/2016/06/0ec37684-7a7c-0010-82c7-eda71af511fa.pdf).

2.6. Check SAP HANA System Replication state


To check the current state of SAP HANA System Replication, you can execute the following command as the SAP HANA administrative user on the current primary SAP HANA node.
On a **single_container** system:
[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py

| Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |

|       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
-|-|-|-|-|-|-|-|-|-|-|-|-|-
| node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

On a multiple_containers system (MDC):

[rh2adm]# python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py
| Database | Host  | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    |

|          |       |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details |
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
| SYSTEMDB | node1 | 30201 | nameserver   |         1 |       1 | DC1       | node2     |     30201 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30207 | xsengine     |         2 |       1 | DC1       | node2     |     30207 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |
| RH2      | node1 | 30203 | indexserver  |         3 |       1 | DC1       | node2     |     30203 |         2 | DC2       | YES           | SYNCMEM     | ACTIVE      |                |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

Check the return code. The output should be 15, which indicates that everything is working from the cluster side.

# echo $?
15

3. Configure SAP HANA in a pacemaker cluster


Please refer to the following documentation to set up a pacemaker cluster. Note that the cluster must conform to the specifications detailed in [Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH](https://access.redhat.com/articles/2881341).

This guide will assume that the following things are working properly:

  • Pacemaker cluster is configured according to the official documentation and has fully operable fencing
  • SAP HANA startup on boot is disabled on all cluster nodes, as the start and stop will be managed by the cluster
  • SAP HANA system replication and takeover using tools from SAP are working properly between cluster nodes
  • Both nodes are subscribed to the required channels. For RHEL 8 this would be 'High-availability' and 'RHEL for SAP Solutions.' See This content is not included.this link for further details.

3.1. Install resource agents and other components required for managing SAP HANA Scale-Up System Replication using the RHEL HA Add-On

[root]# yum install resource-agents-sap-hana

Note: This will only install the resource agents and additional components required to set up this HA solution. The configuration steps documented in the following sections must still be completed for a fully operable setup that is supported by Red Hat.

3.2. Enable the SAP HANA srConnectionChanged() hook


As documented in SAP's [Implementing a HA/DR Provider](https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.03/en-US/1367c8fdefaa4808a7485b09815ae0f3.html), recent versions of SAP HANA provide so called "hooks" that allow SAP HANA to send out notifications for certain events. The `srConnectionChanged() hook` can be used to improve the ability of the cluster to detect when a change in the status of the HANA System Replication occurs that requires the cluster to take action, and to avoid data loss/data corruption by preventing accidental takeovers to be triggered in situations where this should be avoided.

3.2.1. Verify that the installed version of resource-agents-sap-hana package provides the components to enable the srConnectionChanged() hook


Please verify that the correct version of the resource-agents-sap-hana package is installed as documented in the following article: [Is the srConnectionChanged() hook supported with the Red Hat High Availability solution for SAP HANA Scale-up System Replication](https://access.redhat.com/solutions/4886161). Note that the resource-agents-sap-hana package is responsible for providing the components required to enable the srConnectionChanged() hook for your version of RHEL.

3.2.2. Activate the srConnectionChanged() hook on all SAP HANA instances


*Note*: The steps to activate the srConnectionChanged() hook need to be performed for each SAP HANA instance.
**1**. Stop the cluster on both nodes and verify that the HANA instances are stopped completely.
[root]# pcs cluster stop --all

2. Install the hook script into the /hana/shared/myHooks directory for each HANA instance and make sure it has the correct ownership on all nodes (replace rh2adm with the username of the admin user of the HANA instances).

[root]# mkdir -p /hana/shared/myHooks
[root]# cp /usr/share/SAPHanaSR/srHook/SAPHanaSR.py /hana/shared/myHooks
[root]# chown -R rh2adm:sapsys /hana/shared/myHooks

3. Update the global.ini file on each node to enable use of the hook script by both HANA instances (e.g., in file /hana/shared/RH2/global/hdb/custom/config/global.ini):

[ha_dr_provider_SAPHanaSR]
provider = SAPHanaSR
path = /hana/shared/myHooks
execution_order = 1

[trace]
ha_dr_saphanasr = info

4. On each cluster node create the file /etc/sudoers.d/20-saphana by running sudo visudo -f /etc/sudoers.d/20-saphana, and add the contents below to allow the hook script to update the node attributes when the srConnectionChanged() hook is called.

Replace rh2 with the lowercase SID of your HANA installation and replace DC1 and DC2 with your HANA site names.

Cmnd_Alias DC1_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC1_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC1 -v SFAIL -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SOK   = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias DC2_SFAIL = /usr/sbin/crm_attribute -n hana_rh2_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
rh2adm ALL=(ALL) NOPASSWD: DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL
Defaults!DC1_SOK, DC1_SFAIL, DC2_SOK, DC2_SFAIL !requiretty

For further information on why the Defaults setting is needed see The srHook attribute is set to SFAIL in a Pacemaker cluster managing SAP HANA system replication, even though replication is in a healthy state.

5. Start both HANA instances manually without starting the cluster.

6. Verify that the hook script is working as expected. Perform some action to trigger the hook, such as stopping a HANA instance. Then check whether the hook logged anything using the commands listed below.

[rh2adm]# cdtrace
[rh2adm]# awk '/ha_dr_SAPHanaSR.*crm_attribute/ { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_*
2018-05-04 12:34:04.476445 ha_dr_SAPHanaSR SFAIL
2018-05-04 12:53:06.316973 ha_dr_SAPHanaSR SOK
[rh2adm]# grep ha_dr_ *

Note: For more information see Content from help.sap.com is not included.Install and Configure a HA/DR Provider Script.

7. When the functionality of the hook has been verified the cluster can be started again.

[root]# pcs cluster start --all

3.3. Configure general cluster properties


To avoid unnecessary failovers of the resources during initial testing and/or post production, set the following default values for the resource-stickiness and migration-threshold parameters. Note that defaults do not apply to resources which override them with their own defined values.
[root]# pcs resource defaults resource-stickiness=1000
[root]# pcs resource defaults migration-threshold=5000

Warning: As of RHEL 8.4 (pcs-0.10.8-1.el8), the commands above are deprecated. Use the commands below:

[root]# pcs resource defaults update resource-stickiness=1000
[root]# pcs resource defaults update migration-threshold=5000

Notes:

  1. It is sufficient to run the commands above on one node of the cluster.
  2. Previous versions of this document recommended setting these defaults for the initial testing of the cluster setup, but removing them after production. Due to customer feedback and additional testing, it has been determined that it is beneficial to use these defaults for production cluster setups as well.
  3. The command resource-stickiness=1000 will encourage the resource to stay running where it is, while migration-threshold=5000 will cause the resource to move to a new node after 5000 failures. 5000 is generally sufficient in preventing the resource from prematurely failing over to another node. This also ensures that the resource failover time stays within a controllable limit.
    Setting the no-quorum-policy to ignore is NOT supported; therefore, in the default configuration the no-quorum policy property of the cluster should not need to be modified. To achieve the behavior provided by this option see Can I configure pacemaker to continue to manage resources after a loss of quorum in RHEL 6 or 7?

3.4. Create cloned SAPHanaTopology resource


The SAPHanaTopology resource gathers the status and configuration of SAP HANA System Replication on each node. In addition, it starts and monitors the local SAP HostAgent which is required for starting, stopping, and monitoring the SAP HANA instances. It has the following attributes:
Attribute Name Required? Default value Description
SID yes null The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber yes null The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02

Below is an example command to create the SAPHanaTopology cloned resource.
Note: The timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.)

[root]# pcs resource create SAPHanaTopology_RH2_02 SAPHanaTopology SID=RH2 InstanceNumber=02 \
op start timeout=600 \
op stop timeout=300 \
op monitor interval=10 timeout=600 \
clone clone-max=2 clone-node-max=1 interleave=true

Resulting resource should look like the following.

[root]# pcs resource config SAPHanaTopology_RH2_02-clone

 Clone: SAPHanaTopology_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
   Attributes: SID=RH2 InstanceNumber=02
   Operations: start interval=0s timeout=600 (SAPHanaTopology_RH2_02-start-interval-0s)
               stop interval=0s timeout=300 (SAPHanaTopology_RH2_02-stop-interval-0s)
               monitor interval=10 timeout=600 (SAPHanaTopology_RH2_02-monitor-interval-10s)

Once the resource is started you will see the collected information stored in the form of node attributes that can be viewed with the command crm_mon -A1. Below is an example of what attributes can look like when only SAPHanaTopology is started.

[root]# crm_mon -A1
...
Node Attributes:
* Node node1:
    + hana_rh2_remoteHost               : node2
    + hana_rh2_roles                    : 1:P:master1::worker:
    + hana_rh2_site                     : DC1
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node1
* Node node2:
    + hana_rh2_remoteHost               : node1
    + hana_rh2_roles                    : 1:S:master1::worker:
    + hana_rh2_site                     : DC2
    + hana_rh2_srmode                   : syncmem
    + hana_rh2_vhost                    : node2
...

3.5. Create Master/Slave SAPHana resource


The SAPHana resource agent manages two SAP HANA instances (databases) that are configured in HANA System Replication.

Attribute Name Required? Default value Description
SID yes null The SAP System Identifier (SID) of the SAP HANA installation (must be identical for all nodes). Example: RH2
InstanceNumber yes null The Instance Number of the SAP HANA installation (must be identical for all nodes). Example: 02
PREFER_SITE_TAKEOVER no null Should resource agent prefer to switch over to the secondary instance instead of restarting primary locally? true: do prefer takeover to the secondary site; false: do prefer restart locally; never: under no circumstances do a takeover to the other node
AUTOMATED_REGISTER no false If a takeover event has occurred, and the DUPLICATE_PRIMARY_TIMEOUT has expired, should the former primary instance be registered as secondary? ("false": no, manual intervention will be needed; "true": yes, the former primary will be registered by resource agent as secondary) [1]
DUPLICATE_PRIMARY_TIMEOUT no 7200 The time difference (in seconds) needed between two primary time stamps, if a dual-primary situation occurs. If the time difference is less than the time gap, the cluster will hold one or both instances in a "WAITING" status. This is to give the system admin a chance to react to a takeover. After the time difference has passed, if AUTOMATED_REGISTER is set to true, the failed former primary will be registered as secondary. After the registration to the new primary, all data on the former primary will be overwritten by the system replication.

[1] - As a good practice, for test purposes, we recommend leaving AUTOMATED_REGISTER at its default value (AUTOMATED_REGISTER="false") to prevent a failed primary instance from automatically registering as a secondary instance. After testing, if the failover scenarios works as expected, especially for the production environment then we recommend setting AUTOMATED_REGISTER="true", so that after a takeover, the system replication will resume in a timely manner, to avoid further disruption. Do note that when AUTOMATED_REGISTER="false" is set then, in case of a failure on the primary node, after investigation, you will need to manually register it as the secondary HANA System Replication node. The timeouts shown below for the resource operations are only examples and may need to be adjusted depending on the actual SAP HANA setup (for example large HANA databases can take longer to start up therefore the start timeout may have to be increased.) ####3.5.1. Configuring promotable clone resources {#351configuring} The official documentation on configuring promotable clone resources can be found [here](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/index#assembly_creating-promotable-clone-resources-creating-multinode-resources).
[root]# pcs resource create SAPHana_RH2_02 SAPHana SID=RH2 InstanceNumber=02 \
PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=true \
op start timeout=3600 \
op stop timeout=3600 \
op monitor interval=61 role="Slave" timeout=700 \
op monitor interval=59 role="Master" timeout=700 \
op promote timeout=3600 \
op demote timeout=3600 \
promotable notify=true clone-max=2 clone-node-max=1 interleave=true

The resulting resource should look like the following:

[root]# pcs resource config SAPHana_RH2_02
 Clone: SAPHana_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true promotable=true
  Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
   Attributes: AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=180 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH2
   Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
               methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
               monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
               monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
               promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
               start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
               stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)

3.6. Create Virtual IP address resource


The cluster will contain a Virtual IP address in order to reach the promoted instance of SAP HANA. Below is an example command to create an IPaddr2 resource with IP 192.168.0.15.
[root]# pcs resource create vip_RH2_02 IPaddr2 ip="192.168.0.15"

Resulting resource should look like the one below.

[root]# pcs resource show vip_RH2_02

 Resource: vip_RH2_02 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
              stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
              monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)

3.7. Create constraints


For correct operation, we need to ensure that SAPHanaTopology resource is started before the SAPHana resource and also that the virtual IP address is present on the node where the Master resource of SAPHana is running. To achieve this, the following 2 constraints need to be created.

3.7.1 constraint - start SAPHanaTopology before SAPHana


The example command below will create the constraint that mandates the start order of these resources. There are 2 things worth mentioning here:
  • symmetrical=false attribute defines that we care only about the start of resources and they don't need to be stopped in reverse order.
  • Both resources (SAPHana and SAPHanaTopology) have the attribute interleave=true that allows parallel start of these resources on nodes. This permits that, regardless of ordering, we will not wait for all nodes to start SAPHanaTopology but we can start the SAPHana resource on any of the nodes as soon as SAPHanaTopology is running there.
    Create the constraint using the following command
[root]# pcs constraint order SAPHanaTopology_RH2_02-clone then SAPHana_RH2_02-clone symmetrical=false

The resulting constraint should look like the one in the example below:

[root]# pcs constraint
...
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-clone (kind:Mandatory) (non-symmetrical)
...

3.7.2 Constraint - Colocate the IPaddr2 resource with Master of SAPHana resource

Below is an example command that will colocate the IPaddr2 resource with the SAPHana resource that was promoted as the Master.

[root]# pcs constraint colocation add vip_RH2_02 with master SAPHana_RH2_02-clone 2000

Note that the constraint is using a score of 2000 instead of the default INFINITY. This allows the IPaddr2 resource to be taken down by the cluster in case there is no Master promoted in the SAPHana resource so it is still possible to use this address with tools like SAP Management Console or SAP LVM that can use this address to query the status information about the SAP Instance.
The resulting constraint should look like one in the example below:

[root]# pcs constraint
...
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master)
...

3.8. Test the manual move of production SAPHana resources to another node (SAP HANA takeover by cluster)

Moving the SAPHana resource

To perform the failover, run the following command.

[root]# pcs resource move SAPHana_RH2_02-clone

Wait for the Secondary node to fully promote to Primary/Master.
With each pcs resource move command invocation, the cluster creates location constraints to cause the resource to move. These constraints must be removed in order to allow automatic failover in the future*. Once the promote operation is complete, the constraints created by the move should be removed. To do so run the command below.

[root]# pcs resource clear SAPHana_RH2_02-clone

This will start the production SAP HANA instance on the former primary, re-establish the connection to the new primary to complete the sync/replication, and then demote that node/instance to Secondary/Slave for subsequent takeover as necessary.

Wait for the replication to complete before moving ahead.

Note: In order to prepare the systems for the next steps, repeat the above test again to failback the Prod SAP HANA Primary instance back to node 1 so that DEV/QAS Instance can be configured.

4. Configuring the SAP HANA Instance for DEV/QAS on the secondary node for Cost-Optimization

SAP HANA HA/DR provider hook

To automatically reconfigure the productive SAP HANA instance when a takeover event occurs, the Content from help.sap.com is not included.SAP HANA HA/DR provider hooks should be used. Before implementing the automation of the cost-optimized SAP HANA SR HA environment in the cluster, the takeover hook script must be tested by manually by shutting down the DEV/QA SAP HANA instance on the secondary node and then performing a takeover test to verify that the hook script correctly re-configures the primary HANA DB and avoid any cluster failures later on due to incorrectly configured SAP HANA instances.
An example on how to set up the hook script for a 'cost-optimized' SAP HANA SR environment can be found at Content from scn.sap.com is not included.HOW TO SET UP SAPHanaSR IN THE COST OPTIMIZED SCENARIO, but this should normally be done by an experienced SAP HANA implementation partner (e. g. a SAP HANA appliance vendor). Dell, for example provides their own SAP HANA takeover hook scripts. For more information see Content from launchpad.support.sap.com is not included.SAP Note 2196941 - SAP HANA Software Replication Takeover Hook Changes.

4.1 Add the srCostOptMemConfig hooks

4.1.1 Stop the Cluster first

[root]# pcs cluster stop --all

4.1.2 Enable the use of srCostOptMemConfig by updating global.ini on node 2 like the example shown below:

[ha_dr_provider_srCostOptMemConfig]
provider = srCostOptMemConfig
path = /hana/shared/srHook/
execution_order = 2

[trace]
ha_dr_saphanasr = info

4.1.3 Change to /hana/shared/srHook Directory

Use the command below:

[root]# cd /hana/shared/srHook
[root]# ln -s /hana/shared/RH2/exe/linuxx86_64/hdb/python_support/hdbcli/dbapi.py .
[root]# ln -s /hana/shared/RH2/exe/linuxx86_64/hdb/python_support/hdbcli/__init__.py .
[root]# ln -s /hana/shared/RH2/exe/linuxx86_64/hdb/python_support/hdbcli/resultrow.py .

This hook must be installed on node 2 as /hana/shared/srHook/srCostOptMemConfig.py to undo the changes to global_allocation_limit and preload_column_tables in case of a takeover. Ensure that the owner and group of this file is rh2adm and the related sapsys group respectively.

#!/usr/bin/env python

"""
Sample for a HA/DR hook provider for method srPostTakeover().
When using your own code in here, please copy this file to location on /hana/shared
outside the HANA installation.

To configure your own changed version of this file, please add to your global.ini lines similar to this:

[ha_dr_provider_<className>]
provider = <className>
path = /hana/shared/srHook/
execution_order = 2

For all hooks, output needs to be 0 in case of success.
Set the following variables:
* dbinst Instance Number [e.g. 00 - 99 ]
* dbuser Username [ e.g. SYSTEM ]
* dbpwd
* user password [ e.g. RedHat4SAP ]
* dbport port where db listens for SQL connections [e.g 30013 or 30015]
"""
#
# parameter section
#
dbuser="SYSTEM"
dbpwd="<yourPassword1234>"
dbinst="<<InstanceNumber>>"
dbport="3[InstanceNumber]13"
#
# prepared SQL statements to remove memory allocation limit
#    and pre-load of column tables
#
stmnt1 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('memorymanager','global_allocation_limit') WITH RECONFIGURE"
stmnt2 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('system_replication','preload_column_tables') WITH RECONFIGURE"
#
# loading classes and libraries
#
import os, time, dbapi
from hdb_ha_dr.client import HADRBase, Helper
#
# class definition srCostOptMemConfig
#
class srCostOptMemConfig(HADRBase):
  def __init__(self, *args, **kwargs):
       # delegate construction to base class
       super(srCostOptMemConfig, self).__init__(*args, **kwargs)

  def about(self):
      return {"provider_company" : "<customer>",
              "provider_name" : "srCostOptMemConfig", # provider name = class name
              "provider_description" : "Replication takeover script to set parameters to default.",
              "provider_version" : "1.0"}

  def postTakeover(self, rc, **kwargs):
      """Post takeover hook."""
      self.tracer.info("%s.postTakeover method called with rc=%s" % (self.__class__.__name__, rc))
      if rc == 0:
         # normal takeover succeeded
         conn = dbapi.connect('localhost',dbport,dbuser,dbpwd)
         cursor = conn.cursor()
         cursor.execute(stmnt1)
         cursor.execute(stmnt2)
         return 0
      elif rc == 1:
          # waiting for force takeover
          conn = dbapi.connect('localhost',dbport,dbuser,dbpwd)
          cursor = conn.cursor()
          cursor.execute(stmnt1)
          cursor.execute(stmnt2)
          return 0
      elif rc == 2:
          # error, something went wrong
          return 0

Start the HANA instances and check if the srCostOptMemConfig.py is working as the rh2adm user:

[rh2]# cdtrace
[rh2]# grep srCostOptMemConfig nameserver_*.trc

After testing the working of the same the cluster can be started up again:

[root]# pcs cluster start --all 

4.2. Configure the DEV/QA HANA Instance on the secondary node


4.2.1 Add a second IPaddr2 resource to manage VIP DEV/QA HANA Instance

[root]# pcs resource create vip_DEV_QA IPaddr2 ip="192.168.0.16" --group rh3_dev_qa_group

This command will create an IPaddr2 resource in the same resource group as the DEV/QA instance, which will ensure that this IP stays with the DEV/QA instance itself.

4.2.2 Create the required constraints


4.2.2.1 Ensure DEV/QA Instance group does not run on node 1

First, it is important to ensure that the DEV/QA HANA Instance, which is installed on node 2, does not run on the node where the master instance of the Production Database is running, which in this example is hana-21.

Use the following command to set that:

[root]# pcs constraint location add to-avoid-node-1 rh3_dev_qa_group hana-21 -INFINITY resource-discovery=never

The negative -INFINITY in the above command ensures that the DEV/QA SAP HANA Instance does not run on node 1 and resource-discovery=never ensures that cluster does not try to look for the DEV/QA instance resource on node 1 which may cause unnecessary failed action errors. More information about resource-discovery option

Resultant constraint should look like this:

[root@hana-21 ~]# pcs constraint location
Location Constraints:
Location Constraints:
  Resource: rh3_dev_qa_group
    Disabled on:
      Node: hana-21 (score:-INFINITY) (resource-discovery=never)
4.2.2.2 Ensure DEV/QA Instance group is stopped before node 2 becomes master

In case the production master is required to move to the secondary node where the DEV/QA Instance is running, it is important to ensure that the DEV/QA Instance is stopped before the Secondary node is promoted to the master node. This also ensures that the secondary node does not get overloaded, which may destabilize the High Availability. Therefore the following 2 constraints will ensure that both the production master instance and DEV/QA Instance do not run on the same node together.

The following 2 commands will set the constraints to ensure that both instances will not run together on node 2:

[root]# pcs constraint colocation add rh3_dev_qa_group with master SAPHana_RH2_02-clone score=-INFINITY

[root]# pcs constraint order stop rh3_dev_qa_group then promote SAPHana_RH2_02-clone
Adding rh3_dev_qa_group SAPHana_RH2_02-clone (kind: Mandatory) (Options: first-action=stop then-action=promote)

4.2.3 Use the SAPInstance resource agent

The following command will install the SAPInstance resource agent on node 2:

[root@node2 ~]# yum -y install resource-agents-sap

The following command will create the resource for managing the DEV/QA instance:

[root]# pcs resource create rsc_DEVQA_RH3_HDB20 SAPInstance InstanceName="RH3_HDB20_hana-22" MONITOR_SERVICES="hdbindexserver|hdbnameserver" START_PROFILE="/usr/sap/RH3/SYS/profile/RH3_HDB20_hana-22" op start timeout=600 op stop timeout=600 op monitor interval=60 timeout=600 --group rh3_dev_qa_group

The resultant resource should look like this:

[root]# pcs resource config rh3_dev_qa_group
 Group: rh3_dev_qa_group
  Resource: rsc_DEVQA_RH3_HDB20 (class=ocf provider=heartbeat type=SAPInstance)
   Attributes: InstanceName=RH3_HDB20_hana-22 MONITOR_SERVICES=hdbindexserver|hdbnameserver START_PROFILE=/usr/sap/RH3/SYS/profile/RH3_HDB20_hana-22
   Operations: demote interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-demote-interval-0s)
           	methods interval=0s timeout=5s (rsc_DEVQA_RH3_HDB20-methods-interval-0s)
           	monitor interval=60 timeout=600 (rsc_DEVQA_RH3_HDB20-monitor-interval-60)
           	promote interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-promote-interval-0s)
           	reload interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-reload-interval-0s)
           	start interval=0s timeout=600 (rsc_DEVQA_RH3_HDB20-start-interval-0s)
           	stop interval=0s timeout=600 (rsc_DEVQA_RH3_HDB20-stop-interval-0s)

Your fully working cost-optimized setup should resemble the output below:

[root@hana-21 ~]# pcs config
Cluster Name: hana_cop_cluster
Corosync Nodes:
 hana-21 hana-22
Pacemaker Nodes:
 hana-21 hana-22

Resources:


 Clone: SAPHanaTopology_RH2_02-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
   Attributes: InstanceNumber=02 SID=RH2
   Operations: methods interval=0s timeout=5 (SAPHanaTopology_RH2_02-methods-interval-0s)
           	monitor interval=10 timeout=600 (SAPHanaTopology_RH2_02-monitor-interval-10)
           	reload interval=0s timeout=5 (SAPHanaTopology_RH2_02-reload-interval-0s)
           	start interval=0s timeout=600 (SAPHanaTopology_RH2_02-start-interval-0s)
           	stop interval=0s timeout=300 (SAPHanaTopology_RH2_02-stop-interval-0s)
 Clone: SAPHana_RH2_02-clone
  Meta Attrs: promotable=true
  Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
   Attributes: AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=7200 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH2
   Meta Attrs: clone-max=2 clone-node-max=1 interleave=true notify=true
   Operations: demote interval=0s timeout=3600 (SAPHana_RH2_02-demote-interval-0s)
           	methods interval=0s timeout=5 (SAPHana_RH2_02-methods-interval-0s)
           	monitor interval=61 role=Slave timeout=700 (SAPHana_RH2_02-monitor-interval-61)
           	monitor interval=59 role=Master timeout=700 (SAPHana_RH2_02-monitor-interval-59)
           	promote interval=0s timeout=3600 (SAPHana_RH2_02-promote-interval-0s)
           	reload interval=0s timeout=5 (SAPHana_RH2_02-reload-interval-0s)
           	start interval=0s timeout=3600 (SAPHana_RH2_02-start-interval-0s)
           	stop interval=0s timeout=3600 (SAPHana_RH2_02-stop-interval-0s)
 Resource: vip_RH2_02 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.15
  Operations: monitor interval=10s timeout=20s (vip_RH2_02-monitor-interval-10s)
          	start interval=0s timeout=20s (vip_RH2_02-start-interval-0s)
          	stop interval=0s timeout=20s (vip_RH2_02-stop-interval-0s)
 Group: rh3_dev_qa_group
 Resource: vip_DEV_QA (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.0.16
  Operations: monitor interval=10s timeout=20s (vip_DEV_QA-monitor-interval-10s)
          	start interval=0s timeout=20s (vip_DEV_QA-start-interval-0s)
          	stop interval=0s timeout=20s (vip_DEV_QA-stop-interval-0s)
  Resource: rsc_DEVQA_RH3_HDB20 (class=ocf provider=heartbeat type=SAPInstance)
   Attributes: InstanceName=RH3_HDB20_hana-22 MONITOR_SERVICES=hdbindexserver|hdbnameserver START_PROFILE=/usr/sap/RH3/SYS/profile/RH3_HDB20_hana-22
   Operations: demote interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-demote-interval-0s)
           	methods interval=0s timeout=5s (rsc_DEVQA_RH3_HDB20-methods-interval-0s)
           	monitor interval=60 timeout=600 (rsc_DEVQA_RH3_HDB20-monitor-interval-60)
           	promote interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-promote-interval-0s)
           	reload interval=0s timeout=320s (rsc_DEVQA_RH3_HDB20-reload-interval-0s)
           	start interval=0s timeout=600 (rsc_DEVQA_RH3_HDB20-start-interval-0s)
           	stop interval=0s timeout=600 (rsc_DEVQA_RH3_HDB20-stop-interval-0s)

Stonith Devices:

 Resource: fence (class=stonith type=<fence-device-type>)
  Attributes: <fence-device-attributes>
  Operations: monitor interval=60s (fence-monitor-interval-60s)
Fencing Levels:

Location Constraints:

  Resource: rh3_dev_qa_group
	Disabled on:
  	Node: hana-21 (score:-INFINITY) (id:location-rh3_dev_qa_group-hana-21--INFINITY)
Ordering Constraints:
  start SAPHanaTopology_RH2_02-clone then start SAPHana_RH2_02-clone (kind:Mandatory) (non-symmetrical) (id:order-SAPHanaTopology_RH2_02-clone-SAPHana_RH2_02-clone-mandatory)
  promote SAPHana_RH2_02-clone then start vip_RH2_02 (kind:Mandatory) (id:order-SAPHana_RH2_02-clone-vip_RH2_02-mandatory)
  stop rh3_dev_qa_group then promote SAPHana_RH2_02-clone (kind:Mandatory) (id:order-rh3_dev_qa_group-SAPHana_RH2_02-clone-mandatory)
Colocation Constraints:
  vip_RH2_02 with SAPHana_RH2_02-clone (score:2000) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-vip_RH2_02-SAPHana_RH2_02-clone-2000)
  rh3_dev_qa_group with SAPHana_RH2_02-clone (score:-INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-rh3_dev_qa_group-SAPHana_RH2_02-clone-INFINITY)
Ticket Constraints:

Alerts:

 No alerts defined

Resources Defaults:

 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:

 cluster-infrastructure: corosync
 cluster-name: hana_cop_cluster
 dc-version: 2.0.3-5.el8_2.4-4b1f869f0f
 hana_rh2_site_srHook_DC1: PRIM
 hana_rh2_site_srHook_DC2: SOK
 have-watchdog: false
 last-lrm-refresh: 1638448790
 maintenance-mode: false

Quorum:
  Options:

5. Test and verify

5.1 Test the manual move of SAP HANA Prod Instances and verify the behavior of the DEV/QA Instance for Cost-Optimization

5.1.1 Repeat the test steps detailed in part 3 of this document.

[root]# crm_resource --move --resource SAPHana_RH2_02-clone

Notice the stopping and shutdown operation of DEV/QA Database first before node 2 starts getting promoted to Primary/Master. Wait until the promotion process is complete, then clear the constraint that gets created

[root]# pcs resource clear SAPHana_RH2_02-clone

Wait for the demote process to be completed on the new secondary. The status of the cluster should be similar to the output below:

[root]# pcs status
…….
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* Masters: [ hana-22 ]
	* Slaves: [ hana-21 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Started hana-22
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA	(ocf::heartbeat:IPaddr2):	Stopped
    * rsc_DEVQA_RH3_HDB20	(ocf::heartbeat:SAPInstance):	Stopped
…….

Notice how the DEV/QA remains in “Stopped” State while the Master is running on node 2 (hana-22)

5.1.2 Repeat the same commands listed in 5.1.1 and observe the start up of DEV/QA HANA Instance and the failback of Primary/Master Prod HANA instance back to node 1:

[root]# crm_resource --move --resource SAPHana_RH2_02-clone

Wait for the start operation of Master and DEV/QA Instance to complete before removing the constraint:

[root]# pcs resource clear SAPHana_RH2_02-clone

Wait for the slave instance on node 2 to start up and the cluster can look as the one below:

[root@hana-21 ~]# pcs status
…….
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* Masters: [ hana-21 ]
	* Slaves: [ hana-22 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Started hana-21
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA        (ocf::heartbeat:IPaddr2):	Started hana-22
    * rsc_DEVQA_RH3_HDB20	(ocf::heartbeat:SAPInstance):   Started hana-22
…….

5.2 Test the failover by crashing the nodes


Ensure that fencing is working and tested before carrying out this test.

Crash the primary node 1 using the following command:

[root@hana-21 ~]# echo c > /proc/sysrq-trigger

Notice the promote operation of node 2 and since this node will not be the master you can also notice that the DEV/QA HANA Instance will be in stopped state.

[root@hana-22 ~]# pcs status
…...
  * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
	* Started: [ hana-22 ]
	* Stopped: [ hana-21 ]
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* SAPHana_RH2_02    (ocf::heartbeat:SAPHana):    Promoting hana-22
	* Stopped: [ hana-21 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Stopped
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA	(ocf::heartbeat:IPaddr2):	Stopped
    * rsc_DEVQA_RH3_HDB20	(ocf::heartbeat:SAPInstance):	Stopped

When node 1 comes back up it will rejoin the cluster and the HANA instance will behave according to the value set for the AUTOMATED_REGISTER parameter. Refer to the previous table in section 3.5 for more information about this parameter. In this example we have the value of AUTOMATED_REGISTER to ‘true’. Therefore, node 1 has rejoined the cluster as well as automatically registered it as the new secondary.

Check the status with the commands below:

[root@hana-22 ~]# pcs status
…...
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* Masters: [ hana-22 ]
	* Slaves: [ hana-21 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Started hana-22
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA	(ocf::heartbeat:IPaddr2):	Stopped
    * rsc_DEVQA_RH3_HDB20	(ocf::heartbeat:SAPInstance):	Stopped
…...

Repeat the same test for node 2 using the following command:

[root@hana-22 ~]# echo c > /proc/sysrq-trigger

Notice the promote operation on node 1 and start operation of DEV/QA Instance on node 2

[root@hana-22 ~]# pcs status
…...
  * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
	* Started: [ hana-21 ]
	* Stopped: [ hana-22 ]
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* Masters: [ hana-21 ]
	* Stopped: [ hana-22 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Stopped
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA        (ocf::heartbeat:IPaddr2):	Started hana-22
    * rsc_DEVQA_RH3_HDB20    (ocf::heartbeat:SAPInstance):    Starting hana-22
…..

Wait for the cluster to stabilize:

[root@hana-22 ~]# pcs status
…..
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
	* Masters: [ hana-21 ]
	* Slaves: [ hana-22 ]
  * vip_RH2_02    (ocf::heartbeat:IPaddr2):    Started hana-21
  * Resource Group: rh3_dev_qa_group:
    * vip_DEV_QA        (ocf::heartbeat:IPaddr2):	Started hana-22
    * rsc_DEVQA_RH3_HDB20    (ocf::heartbeat:SAPInstance):    Started hana-22
…..
SBR
Category
Article Type