Pacemaker's "db2" resource fails to start due to database not being in a HADR configuration
Environment
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 8
- Red Hat Enterprise Linux 9
- Pacemaker Cluster
- IBM DB2 managed by ocf:heartbeat:db2
Issue
Pacemaker's db2 resource agent fails start operations and the below error can be observed in standard logs:
$ cat /var/log/messages
------------------------------------->8--------------------------------------
Apr 26 09:03:38 node01 db2(db2_resource)[####]: ERROR: DB2 database instance/db is not in a HADR configuration but I am a M/S resource
Apr 26 09:03:38 node01 crmd[####]: notice: Result of start operation for db2_resource on node01: 5 (not installed)
Apr 26 09:03:38 node01 crmd[####]: warning: Action 7 (db2_resource_start_0) on node01 failed (target: 0 vs. rc: 5): Error
An error of not installed can be observed from pcs status output:
$ pcs status --full
------------------------------------->8--------------------------------------
Failed Resource Actions:
* db2_resource_start_0 on node01 'not installed' (5): call=498, status=complete, exitreason='',
last-rc-change='Tue Apr 26 09:03:34 2022', queued=1ms, exec=4455ms
Resolution
The db2 resource agent uses the runasdb2 command in order to get the current Role status ( PRIMARY or STANDBY ) for the db2 database. This command set is not provided by Red Hat and more information would be needed from IBM to determine why this command may report the HADR_ROLE incorrectly and how best to address this.
Root Cause
From the resource agent we can see the observed error occurs when the resource is used as a promotable /demotable resource, but the database is reporting as STANDARD instead of PRIMARY or STANDBY. When a multi-state resource lists a non-multistate role the error is generated. Additionally the error is reported when the resource agent is not configured as a multi-state resource:
From Rhel 7 resource-agents-4.1.1-61.el7_9.13 resources. Rhel 8 and 9 resources are similar:
#
# Delayed check of the compatibility of DB2 instance and pacemaker
# config.
# Logically this belongs to validate but certain parameters can only
# be retrieved once the instance is started.
#
db2_check_config_compatibility() {
local db=$1
local is_ms
ocf_is_ms <--- Is this a master/slave resource?
is_ms=$? '0' if true and '1' if false.
case "$HADR_ROLE/$is_ms" in <--- Check the HADR_ROLE for DB and Master/Slave status for resource.
STANDARD/0) <--- Our case we are multistate, but report a non-multistate role "STANDARD"
ocf_log err "DB2 database $instance/$db is not in a HADR configuration but I am a M/S resource"
exit $OCF_ERR_INSTALLED
;;
STANDARD/1)
# OK
;;
*/0)
if [ -z "$HADR_PEER_WINDOW" ]
then
ocf_log err "DB2 database $instance: release to old, need HADR_PEER_WINDOW (>=V9)"
exit $OCF_ERR_INSTALLED
fi
;;
*/1) <--- Resource is not configured as master/slave
ocf_log err "DB2 database $instance/$db is in a HADR configuration but I must be a M/S resource"
esac
}
Diagnostic Steps
From the resource agent we can see that the runasdb2 db2 get db cfg for <db2 instance> command is used to get the HADR_ROLE status:
From Rhel 7 resource-agents-4.1.1-61.el7_9.13 resources. Rhel 8 and 9 resources are similar:
#
# get some data from the database config
# sets HADR_ROLE HADR_TIMEOUT HADR_PEER_WINDOW
#
db2_get_cfg() {
local db=$1
local output hadr_vars
output=$(runasdb2 db2 get db cfg for $db) <--- output converted to
[ $? != 0 ] && return $OCF_ERR_GENERIC HADR_ROLE and other variables
hadr_vars=$(echo "$output" |
awk '/HADR database role/ {printf "HADR_ROLE='%s'; ", $NF;}
/HADR_TIMEOUT/ {printf "HADR_TIMEOUT='%s'; ", $NF;}
/First active log file/ {printf "FIRST_ACTIVE_LOG='%s'\n", $NF;}
/HADR_PEER_WINDOW/ {printf "HADR_PEER_WINDOW='%s'\n", $NF;}')
# sets HADR_ROLE HADR_TIMEOUT HADR_PEER_WINDOW
eval $hadr_vars
# HADR_PEER_WINDOW comes with V9 and is checked later
if [ -z "$HADR_ROLE" -o -z "$HADR_TIMEOUT" ]
then
ocf_log error "DB2 cfg values invalid for $instance($db2node)/$db: $hadr_vars"
return $OCF_ERR_GENERIC
fi
return $OCF_SUCCESS
}
The db2 command is not provided by Red Hat and more information would be needed from IBM to determine why this command may report the HADR_ROLE as STANDARD.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.