Why `SAPInstance` cluster resources cannot start after implementing the `sap_cluster_connector`?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 8.5+, 9 with High-Availability Add-on
  • Pacemaker cluster running SAPInstance resources that are also managed by external tools such as SAP Management Console (MC/MMC) or SAP Landscape Management (LaMa)

Issue

  • Pacemaker cannot start the SAPInstance cluster resources (ASCS/ERS) after implementing the sap_cluster_connector.
  • Only when the sap_cluster_connector configuration is not present, Pacemaker can start the SAPInstance cluster resources (ASCS/ERS).

Resolution

Red Hat Enterprise Linux 8

  • The issue (bugzilla bug: 2118745) has been resolved with the errata RHBA-2022:6451 with the following package(s): pacemaker-2.1.2-4.el8_6.3 or later for RHEL 8.6.z.
  • The issue (bugzilla bug: 2118337) has been resolved with the errata RHBA-2022:7573 with the following package(s): pacemaker-2.1.4-5.el8 or later.

Red Hat Enterprise Linux 9

  • The issue (bugzilla bug: 2118744) has been resolved with the errata RHBA-2022:6581 with the following package(s): pacemaker-2.1.2-4.el9_0.1 or later for RHEL 9.0.z.
  • The issue (bugzilla bug: 2089353) has been resolved with the errata RHBA-2022:7937 with the following package(s): pacemaker-2.1.4-5.el9 or later.

Root Cause

The Pacemaker versions in RHEL 8.5 and 9.0 beta and later are affected by this issue. A pending command, which has an execution status of -1, is shown as complete (0).

Diagnostic Steps

Steps to Reproduce:

1] Install sap-cluster-connector on both nodes:

[root@s4hana06 ~]# rpm -qa | grep connector
sap-cluster-connector-3.0.1-7.el9.1.noarch

[root@s4hana05 heartbeat]# rpm -qa | grep connector
sap-cluster-connector-3.0.1-7.el9.1.noarch

2] Disable SAP Instance Resources

[root@s4hana06 ~]# pcs resource disable s4h_ers29; pcs resource disable s4h_ascs20

3] Add the following lines to the end of the file of their respective Instances

[root@s4hana06 ~]# vim /sapmnt/S4H/profile/S4H_ASCS20_s4ascs
service/halib = $(DIR_CT_RUN)/saphascriptco.so
service/halib_cluster_connector = /usr/bin/sap_cluster_connector

[root@s4hana06 ~]# vim /sapmnt/S4H/profile/S4H_ERS29_s4ers
service/halib = $(DIR_CT_RUN)/saphascriptco.so
service/halib_cluster_connector = /usr/bin/sap_cluster_connector

4] Kill the remaining processes on both nodes:

[root@s4hana06 ~]# ps aux|grep sapstartsrv|grep s4hadm
s4hadm   3128462  0.0  0.0 848668 92040 ?        Ssl  13:50   0:01 /usr/sap/S4H/ERS29/exe/sapstartsrv pf=/sapmnt/S4H/profile/S4H_ERS29_s4ers -D -u s4hadm
[root@s4hana06 ~]# kill 3128462

[root@s4hana05 heartbeat]# ps aux|grep sapstartsrv|grep s4hadm
s4hadm    985652  0.0  0.0 991344 96984 ?        Ssl  13:49   0:02 /usr/sap/S4H/ASCS20/exe/sapstartsrv pf=/sapmnt/S4H/profile/S4H_ASCS20_s4ascs -D -u s4hadm

5] Enable SAP Instance Resources

[root@s4hana06 ~]# pcs resource enable s4h_ascs20; pcs resource enable s4h_ers29

Actual results:

After executing step 5 the resources show starting and then the following errors occur:

Failed Resource Actions:
  * s4h_ascs20_start_0 on s4hana05 'not running' (7): call=90, status='complete', exitreason='', last-rc-change='2022-05-23 15:08:26 +02:00', queued=0ms, exec=12798ms
  * s4h_ers29_start_0 on s4hana05 'not running' (7): call=102, status='complete', exitreason='', last-rc-change='2022-05-23 15:08:42 +02:00', queued=0ms, exec=12884ms

And the resources:

    * s4h_ascs20        (ocf:heartbeat:SAPInstance):     Stopped (disabled)
    * s4h_ers29 (ocf:heartbeat:SAPInstance):     Stopped (disabled)

Any attempts to cleanup and/or enable the resource doesn't help and the same issue repeats.

Expected results:

After enabling the resource it should start without such errors:

Additional info:

Error messages in /var/log/messages

May 23 15:21:07 s4hana06 SAPInstance(s4h_ascs20)[3519701]: ERROR: SAP Instance S4H-ASCS20 start failed: #01223.05.2022 15:21:07#012WaitforStarted#012FAIL: process msg_server MessageServer not running
May 23 15:21:07 s4hana06 pacemaker-controld[3117621]: notice: Result of start operation for s4h_ascs20 on s4hana06: not running
May 23 15:21:07 s4hana06 pacemaker-attrd[3117619]: notice: Setting fail-count-s4h_ascs20#start_0[s4hana06]: (unset) -> INFINITY
May 23 15:21:07 s4hana06 pacemaker-attrd[3117619]: notice: Setting last-failure-s4h_ascs20#start_0[s4hana06]: (unset) -> 1653312067
May 23 15:21:07 s4hana06 pacemaker-controld[3117621]: notice: Requesting local execution of stop operation for s4h_ascs20 on s4hana06

Error messages in /usr/sap/S4H/ERS29/work/sapstartsrv.log

Initiating start via cluster API at 2022/05/23 15:21:17
trusted unix domain socket user is stopping SAP System at 2022/05/23 15:21:28
SAP HA Trace: Mon May 23 15:21:28 2022
SAP HA Trace: === SAP_HA_FindSAPInstance ===
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr ...
SAP HA Trace: searchClusterFile: S4H:29 found
SAP HA Trace: Mon May 23 15:21:28 2022
SAP HA Trace: --- SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK ---
SAP HA Trace: Mon May 23 15:21:28 2022
SAP HA Trace: === SAP_HA_StopCluster ===
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa ...
SAP HA Trace: SAP_HA_StopCluster: DID NOT FOUND A PENDING ACTION -> SAP_HA_OK
SAP HA Trace: SAP_HA_StopCluster: calling fire_resource_action
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector fra ...
SAP HA Trace: Mon May 23 15:21:28 2022
SAP HA Trace: --- SAP_HA_StopCluster Exit-Code: SAP_HA_OK ---

Error messages in file /usr/sap/S4H/ASCS20/work/sapstartsrv.log

Initiating start via cluster API at 2022/05/23 15:20:57
trusted unix domain socket user is stopping SAP System at 2022/05/23 15:21:08
SAP HA Trace: Mon May 23 15:21:08 2022
SAP HA Trace: === SAP_HA_FindSAPInstance ===
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector lsr ...
SAP HA Trace: searchClusterFile: S4H:20 found
SAP HA Trace: Mon May 23 15:21:08 2022
SAP HA Trace: --- SAP_HA_FindSAPInstance Exit-Code: SAP_HA_OK ---
SAP HA Trace: Mon May 23 15:21:08 2022
SAP HA Trace: === SAP_HA_StopCluster ===
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector cpa ...
SAP HA Trace: SAP_HA_StopCluster: DID NOT FOUND A PENDING ACTION -> SAP_HA_OK
SAP HA Trace: SAP_HA_StopCluster: calling fire_resource_action
SAP HA Trace: Fire system command /usr/bin/sap_cluster_connector fra ...
SAP HA Trace: Mon May 23 15:21:08 2022
SAP HA Trace: --- SAP_HA_StopCluster Exit-Code: SAP_HA_OK ---

Recovery Steps:

Remove the implementation of sap-cluster-connector, kill the remaining sapstartsrv processes, cleanup and enable the resources:

1] Comment the last 2 lines:

[root@s4hana06 ]# vim /sapmnt/S4H/profile/S4H_ERS29_s4ers 
# service/halib = $(DIR_CT_RUN)/saphascriptco.so
# service/halib_cluster_connector = /usr/bin/sap_cluster_connector

[root@s4hana06 ]# vim /sapmnt/S4H/profile/S4H_ASCS20_s4ascs
# service/halib = $(DIR_CT_RUN)/saphascriptco.so
# service/halib_cluster_connector = /usr/bin/sap_cluster_connector

2] Kill the processes on both nodes

[root@s4hana06 ]# ps aux | grep sapstart | grep s4hadm

3] Enable the resources

[root@s4hana06 ]# pcs resource enable s4h_ascs20; pcs resource enable s4h_ers29

    * s4h_ascs20        (ocf:heartbeat:SAPInstance):     Started s4hana05
    * s4h_ers29 (ocf:heartbeat:SAPInstance):     Started s4hana06
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.