Configuring IBM DB2 in pacemaker cluster
Note: This document is under development, is subject to substantial change, and is provided only as a preview. The included information and instructions should not be considered complete, and should be used with caution. For feedback on this preview version, feel free to use the comment section at the end of this page.
1. Overview
This document describes how to configure IBM DB2 for use in pacemaker cluster. This document does not contain detailed DB2 installation procedures, but it provides the considerations needed to run DB2 in the cluster. For DB2 installation guides, please refer to the official DB2 documentation.
- Content from www.ibm.com is not included.IBM Knowledge Center - DB2 v11.1 - Installing
- Content from www.ibm.com is not included.IBM Knowledge Center - DB2 v10.5 - Installing
1.1. Environment
- Red Hat Enterprise Linux 6, 7, 8 with High-Availability or Resilient Storage Add-on
- IBM DB2 database version 10.5, 11.1 with and without HADR (High Availability Disaster Recovery)
This document will assume that you have already configured basic pacemaker cluster and that it contains functional and tested fencing configuration as required by the Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH.
1.2. Architecture
There are 2 possible configurations for IBM DB2 in pacemaker cluster:
- Failover configuration
- shared storage required (HA-LVM with tagging is used in further examples)
- HADR (High Availability Disaster Recovery) configuration
- no shared storage required, IBM DB2 with HADR support is required
1.3. Support limitations
The minimally required versions of the package resource-agents are:
for DB2 (10.5, 11.1) in 'HADR configuration'
- RHEL7:
resource-agents-3.9.5-105.el7
for DB2 (10.5, 11.1) in 'failover configuration'
- RHEL7:
resource-agents-3.9.5-54.el7 - RHEL6:
resource-agents-3.9.5-12.el6
Check the Support Policy for RHEL High Availability Clusters - Management of IBM Db2 for Linux, Unix, and Windows in a Cluster for more details.
2. DB2 in Failover configuration
In this configuration, the DB2 requires the shared storage, which contains the DB2 instance data that will be managed by the cluster. In case of issues, the DB2 instance is first stopped on the node where it was running, along with the filesystem resources containing the instance data, and these are then started on the other node.
2.1. Environment
An example for the failover configuration will use the following values:
# hostnames and IP addresses
Node 1: node1 / 192.168.0.11
Node 2: node2 / 192.168.0.12
DB2: db2-ha-ip / 192.168.0.10
# shared storage configuration
shared VG name: vg_shared
LV containing DB2 instances: lv_db2
Filesystem for DB2 instances: /db2 (ext4)
# DB2 related information
DB2 instance name: db2inst1
Database name: SAMPLE
The cluster will use a single resource group, named db2inst1_group that will contain all the cluster resources, associated with the DB2 instance db2inst1.
2.2. Configuration
It is recommended to configure the shared storage in HA-LVM before performing the DB2 installation, to avoid issues when copying the files to a shared storage. To configure HA-LVM follow the documentation in the article What is a Highly Available LVM (HA-LVM) configuration and how do I implement it?.
After following the procedure, the cluster will contain the LVM cluster resource db2inst1_lvm that will activate the vg_shared VG on the node where it is running.
[root@node1]# pcs resource show db2inst1_lvm
...
Resource: db2inst1_lvm (class=ocf provider=heartbeat type=LVM)
Attributes: exclusive=true volgrpname=vg_shared
...
- Add the
LVMcluster resourcedb2inst1_lvminto thedb2inst1_groupresource group, and verify thepcs statuscommand output.
[root@node1]# pcs resource group add db2inst1_group db2inst1_lvm
[root@node1]# pcs status
...
Resource Group: db2inst1_group
db2inst1_lvm (ocf::heartbeat:LVM): Started node1
...
- Add the
Filesystemcluster resource, containing the filesystem for DB2 instance to the resource groupdb2inst1_group.
[root@node1]# pcs resource create db2inst1_fs Filesystem device=/dev/vg_shared/lv_db2 directory=/db2 fstype=ext4 --group db2inst1_group
[root@node1]# pcs status
...
Resource Group: db2inst1_group
db2inst1_lvm (ocf::heartbeat:LVM): Started node1
db2inst1_fs (ocf::heartbeat:Filesystem): Started node1
...
- Add the IP address that will be used by the DB2 instance to the resource group
db2inst1_group.
[root@node1]# pcs resource create db2inst1_ip IPaddr2 ip=192.168.0.10 --group db2inst1_group
[root@node1]# pcs status
...
Resource Group: db2inst1_group
db2inst1_lvm (ocf::heartbeat:LVM): Started node1
db2inst1_fs (ocf::heartbeat:Filesystem): Started node1
db2inst1_ip (ocf::heartbeat:IPaddr2): Started node1
...
- Ensure that on both nodes, the DB2 virtual hostname can be resolved. You can run:
[root@node1]# ping -c 2 db2-ha-ip
PING db2-ha-ip (192.168.0.10) 56(84) bytes of data.
64 bytes from db2-ha-ip (192.168.0.10): icmp_seq=1 ttl=64 time=0.021 ms
64 bytes from db2-ha-ip (192.168.0.10): icmp_seq=2 ttl=64 time=0.030 ms
[root@node2]# ping -c 2 db2-ha-ip
PING db2-ha-ip (192.168.0.10) 56(84) bytes of data.
64 bytes from db2-ha-ip (192.168.0.10): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from db2-ha-ip (192.168.0.10): icmp_seq=2 ttl=64 time=0.028 ms
-
Install the DB2 on
node1and configure the home directories for all shared DB2 instances to be in the/db2directory. A typical DB2 installation creates 2 users, which in the provided example, are expected to have their home directories in the directories/db2/db2inst1and/db2/db2fenc1. Important: DB2 itself should be installed on non-shared disks. Only the DB2 instances should be present on the shared filesystem provided by thedb2inst1_fscluster resource in this example. -
As
db2inst1user, create SAMPLE database that will be managed by the cluster. Note: This step assumes thatdb2inst1_groupis started onnode1.
[root@node1]# sudo -i -u db2inst1
[db2inst1@node1]$ db2sampl
- On
node1change the content of the/db2/db2inst1/sqllib/db2nodes.cfgso it contains the virtual DB2 hostnamedb2-ha-ip. The content of the file should look like below:
0 db2-ha-ip 0
- As DB2 instance user (
db2inst1), create the script/db2/db2inst1/db2_local_rshwith the below content, and make it executable.
[db2inst1@node1]$ cat /db2/db2inst1/db2_local_rsh
#!/bin/sh
shift 4
eval "$@"
[db2inst1@node1]$ chmod +x /db2/db2inst1/db2_local_rsh
- The
/db2/db2inst1/db2_local_rshscript will be used by DB2 to start and stop the database using the virtual DB2 hostnamedb2-ha-ip. To make DB2 aware that this script should be used, please execute the command below asdb2inst1user.
[db2inst1@node1]$ db2set DB2RSHCMD=$INSTHOME/db2_local_rsh
- While still logged as user
db2inst1, try to start and stop the DB2 database to verify that script works properly:
[db2inst1@node1]$ db2start
11/01/2017 00:00:00 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
[db2inst1@node1]$ db2stop
11/01/2017 00:00:00 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
- Do not continue if the database cannot be successfully started/stopped. If you encounter error starting/stopping the database, check the DB2 logs and engage the DB2 support if needed, to determine the cause of the issues. One of the common issues is the message below, which usually indicates that the
db2-ha-iphostname cannot be properly resolved to the IP address:
SQL6031N Error in the db2nodes.cfg file at line number "1". Reason code "10".
- If the previous steps were successful, stop the DB2 database on
node1and move the resource groupdb2inst1_grouptonode2.
[db2inst1@node1]$ db2stop
11/01/2017 00:00:00 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
[root@node1]# pcs resource move db2inst1_group
[root@node1]# pcs resource clear db2inst1_group
- Wait a few moments, and verify if all the cluster resources from the group
db2inst1_groupwere successfully started onnode2.
[root@node1]# pcs status
...
Resource Group: db2inst1_group
db2inst1_lvm (ocf::heartbeat:LVM): Started node2
db2inst1_fs (ocf::heartbeat:Filesystem): Started node2
db2inst1_ip (ocf::heartbeat:IPaddr2): Started node2
...
-
On
node2install the DB2 database without creating thedb2inst1instance. After the installation, check on both nodes and, if needed, synchronize the following:-
file
/etc/services: the File must contain the same definitions for DB2 on both nodes, for example thedb2inst1creates the following entries:DB2_db2inst1 60006/tcp DB2_db2inst1_1 60007/tcp DB2_db2inst1_2 60008/tcp DB2_db2inst1_3 60009/tcp DB2_db2inst1_4 60010/tcp DB2_db2inst1_END 60011/tcp db2c_db2inst1 50000/tcp -
Users/Group: Both nodes must contain DB2 unix users/groups with same UIDs/GIDs or both nodes should be able to resolve the same DB2 related users/groups the same way. You can use the tools like
vipw,vipw -s,vigrandvigr -sto edit the needed files if you are not using a centralized user/group system. -
file
/etc/hostsor DNS resolution: Both nodes must be able to resolve the virtual DB2 hostnamedb2-ha-ipto the same IP address (ideally, this hostname is present in the/etc/hostsfiles for additional resilience from DNS resolution failures)
-
-
On
node2asdb2inst1user, test starting and stopping the DB2 database.
[db2inst1@node2]$ db2start
11/01/2017 00:00:00 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
[db2inst1@node2]$ db2stop
11/01/2017 00:00:00 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
- Do not continue if the database cannot be successfully started/stopped. If you encounter error starting/stopping the database, check the DB2 logs and engage the DB2 support, if needed, to determine the cause of the issues. One of the common issues is the message below, which usually indicates that
db2-ha-iphostname cannot be properly resolved to the IP address.
SQL6031N Error in the db2nodes.cfg file at line number "1". Reason code "10".
- Stop the DB2 on
node2
[db2inst1@node2]$ db2stop
11/01/2017 00:00:00 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
- Configure the DB2 cluster resource in the cluster to manage the SAMPLE database from the 'db2inst1` instance'.
NOTE: If the database instance user's login shell is set to csh or tcsh, then the db2 cluster resource would fail. This is detailed in the link A db2 resource fails when the database instance user's login shell is set to csh or tcsh in a Pacemaker cluster
NOTE2: Parameter dblist is the list of databases to be managed by the cluster ("db1 db2 dbX"). If that parameter is not specified, all databases in the instance will be managed.
[root@node2]# pcs resource create db2inst1_db2 db2 dblist=sample instance=db2inst1 --group db2inst1_group
[root@node2]# pcs status
...
Resource Group: db2inst1_group
db2inst1_lvm (ocf::heartbeat:LVM): Started node2
db2inst1_fs (ocf::heartbeat:Filesystem): Started node2
db2inst1_ip (ocf::heartbeat:IPaddr2): Started node2
db2inst1_db2 (ocf::heartbeat:db2): Started node2
...
- If the DB2 cluster resource started correctly, test the relocation of the resource group to another node to verify that it also starts correctly there. The second command removes the constrain created by the first command, that makes the resource group relocate.
[root@node2]# pcs resource move db2inst1_group
[root@node2]# pcs resource clear db2inst1_group
3. DB2 in HADR configuration
This configuration doesn't require a shared storage but requires the use of the DB2 HADR technology that ensures the replication of the DB2. In case of issue with the DB2 on the node where the primary DB2 instance is running, the other node takes over the primary role and reverses the replication.
3.1. Environment
An example for the HADR configuration will use the following values:
# hostnames and IP addresses
Node 1: node1 / 192.168.0.11
Node 2: node2 / 192.168.0.12
DB2: db2-ha-ip / 192.168.0.10
# DB2 related information
DB2 instance name: db2inst1
Database name: SAMPLE
Service port for HADR: 52000/tcp
db2inst1 user home dir: /db2/db2inst1
DB2 will be configured for HADR replication that ensures that DB2 instance data are replicated between the nodes, without needing the shared storage.
3.2. Configuration
- On both nodes add the TCP port 52000 used for HADR replication to the firewall configuration and reload the firewall.
[root@node1]# firewall-cmd --add-port=52000/tcp --permanent
success
[root@node1]# firewall-cmd --reload
success
[root@node2]# firewall-cmd --add-port=52000/tcp --permanent
success
[root@node2]# firewall-cmd --reload
success
-
Install DB2 with the
db2inst1instance on both nodes. -
Follow the steps from the DB2 HADR documentation to setup the HADR replication. The steps below demonstrate how to setup HADR for SAMPLE database:
-
On
node1create SAMPLE database as userdb2inst1.
[root@node1]# sudo -i -u db2inst1
[db2inst1@node1]$ db2sampl
Creating database "SAMPLE"...
Connecting to database "SAMPLE"...
Creating tables and data in schema "DB2INST1"...
Creating tables with XML columns and XML data in schema "DB2INST1"...
Stopping the DB2 instance...
'db2sampl' processing complete.
- Configure 'Archive logging' for SAMPLE database. The commands below assume the use of "on disk" log directory that needs to be owned and writable by the user
db2inst1. This directory must exist on all nodes.
[db2inst1@node1]$ mkdir /db2/db2inst1/archive_dir
[db2inst1@node2]$ mkdir /db2/db2inst1/archive_dir
[db2inst1@node1]$ db2 UPDATE DB CFG FOR SAMPLE USING LOGINDEXBUILD ON LOGARCHMETH1 DISK:/db2/db2inst1/archive_dir
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
- Database manager must be started to change the configuration for the SAMPLE database. The message below indicates that the database needs to be started with the
db2startcommand, before running the above configuration change.
SQL1032N No start database manager command was issued. SQLSTATE=57019
[db2inst1@node1]$ db2start
11/01/2017 00:00:00 0 0 SQL1063N DB2START processing was successful.
- Create a backup of the SAMPLE database on
node1.
[db2inst1@node1]$ db2 BACKUP DB SAMPLE
Backup successful. The timestamp for this backup image is : 20171101000000
- Transfer the backup of the SAMPLE database from
node1tonode2intodb2inst1user HOME directory. Ensure that the backup file is readable by the userdb2inst1.
[db2inst1@node1]$ scp SAMPLE.0.db2inst1.DBPART000.20171101000000.001 db2inst1@node2:
- On
node2restore the SAMPLE database, transferred fromnode1.
[root@node2]# sudo -i -u db2inst1
[db2inst1@node2]$ db2 RESTORE DB SAMPLE
DB20000I The RESTORE DATABASE command completed successfully.
- Database manager must be started to change the configuration for the SAMPLE database. The message below indicates that the database needs to be started with the
db2startcommand before running the above configuration change.
SQL1032N No start database manager command was issued. SQLSTATE=57019
[db2inst1@node1]$ db2start
11/01/2017 00:00:00 0 0 SQL1063N DB2START processing was successful.
- On
node1configure SAMPLE database replication parameters.
[db2inst1@node1]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_HOST node1 HADR_LOCAL_SVC 52000 HADR_SYNCMODE SYNC
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
[db2inst1@node1]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_HOST node2 HADR_REMOTE_SVC 52000 HADR_REMOTE_INST db2inst1
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
[db2inst1@node1]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_TARGET_LIST node2:52000
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
- On
node2configure SAMPLE database replication parameters.
[db2inst1@node2]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_HOST node2 HADR_LOCAL_SVC 52000
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
[db2inst1@node2]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_HOST node1 HADR_REMOTE_SVC 52000 HADR_REMOTE_INST db2inst1
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
[db2inst1@node2]$ db2 UPDATE DB CFG FOR SAMPLE USING HADR_TARGET_LIST node1:52000
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
- On both nodes, check the SAMPLE database configuration values using the commands below:
[db2inst1@node1]$ db2 get db cfg for SAMPLE|egrep '(HADR_LOCAL_HOST|HADR_REMOTE_HOST|HADR_TARGET_LIST|LOGARCHMETH1)'
HADR local host name (HADR_LOCAL_HOST) = node1
HADR remote host name (HADR_REMOTE_HOST) = node2
HADR target list (HADR_TARGET_LIST) = node2:52000
First log archive method (LOGARCHMETH1) = DISK:/db2/db2inst1/archive_dir/
[db2inst1@node2]$ db2 get db cfg for SAMPLE|egrep '(HADR_LOCAL_HOST|HADR_REMOTE_HOST|HADR_TARGET_LIST|LOGARCHMETH1)'
HADR local host name (HADR_LOCAL_HOST) = node2
HADR remote host name (HADR_REMOTE_HOST) = node1
HADR target list (HADR_TARGET_LIST) = node1:52000
First log archive method (LOGARCHMETH1) = DISK:/db2/db2inst1/archive_dir/
- On
node2, start the standby DB with the command below:
[db2inst1@node2]$ db2 START HADR ON DB SAMPLE AS STANDBY
DB20000I The START HADR ON DATABASE command completed successfully.
- On
node1start the primary DB with the command below:
[db2inst1@node1]$ db2 START HADR ON DB SAMPLE AS PRIMARY
DB20000I The START HADR ON DATABASE command completed successfully.
- Verify that HADR has started and that it communicates properly. Both nodes should report
CONNECTEDand have correct HADR_ROLE.
[db2inst1@node1]$ db2pd -db SAMPLE -hadr|egrep '(HADR_ROLE|HADR_CONNECT_STATUS)'
HADR_ROLE = PRIMARY
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 11/01/2017 00:00:00.184172 (1510354800)
[db2inst1@node2]$ db2pd -db SAMPLE -hadr|egrep '(HADR_ROLE|HADR_CONNECT_STATUS)'
HADR_ROLE = STANDBY
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 11/01/2017 00:00:00.189120 (1510354800)
-
It is strongly recommended to test switching the HADR roles using the documentation below, before continuing to cluster configuration.
-
Configure the DB2 Master/Slave resource and Virtual IP address for DB2 with SAMPLE database:
NOTE: If the database instance user's login shell is set to csh or tcsh, then the db2 cluster resource would fail. This is detailed in the link A db2 resource fails when the database instance user's login shell is set to csh or tcsh in a Pacemaker clusterFor RHEL 6.x, 7.x
[root@node1]# pcs resource create DB2_HADR db2 instance='db2inst1' dblist='sample' --master meta notify=true [root@node1]# pcs resource create DB2_IP IPaddr2 ip='192.168.0.10'
-
and check that both resources were fully started.
[root@node1]# pcs status ... Master/Slave Set: DB2_HADR-master [DB2_HADR] Masters: [ node1 ] Slaves: [ node2 ] DB2_IP (ocf::heartbeat:IPaddr2): Started node1 ...For RHEL 8.x
[root@node1]# pcs resource create DB2_HADR db2 instance='db2inst1' dblist='sample' promotable notify=true [root@node1]# pcs resource create DB2_IP IPaddr2 ip='192.168.0.10' -
and check that both resources were fully started.
[root@node1]# pcs status ... * Clone Set: DB2_HADR-clone [DB2_HADR] (promotable): * DB2_HADR (ocf::heartbeat:db2): Slave node2 * DB2_HADR (ocf::heartbeat:db2): Master node1 DB2_IP (ocf::heartbeat:IPaddr2): Started node1 ...
-
Create constraints for Virtual IP address that allows it to start when DB2 Master is available, and run on the same node as the DB2 Master cluster resource.
For RHEL 6.x, 7.x
# pcs constraint colocation add DB2_IP with master DB2_HADR-master # pcs constraint order promote DB2_HADR-master then DB2_IPFor RHEL 8.x
# pcs constraint colocation add DB2_IP with master DB2_HADR-clone # pcs constraint order promote DB2_HADR-clone then DB2_IP -
Verify the cluster resources and the constraints configuration.
For RHEL 6.x, 7.x
# pcs resource show DB2_HADR-master Master: DB2_HADR-master Meta Attrs: notify=true Resource: DB2_HADR (class=ocf provider=heartbeat type=db2) Attributes: instance=db2inst1 dblist=sample Operations: start interval=0s timeout=120 (DB2_HADR-start-interval-0s) stop interval=0s timeout=120 (DB2_HADR-stop-interval-0s) promote interval=0s timeout=120 (DB2_HADR-promote-interval-0s) demote interval=0s timeout=120 (DB2_HADR-demote-interval-0s) monitor interval=20 timeout=60 (DB2_HADR-monitor-interval-20) monitor interval=22 role=Master timeout=60 (DB2_HADR-monitor-interval-22)
# pcs constraint
...
Ordering Constraints:
promote DB2_HADR-master then start DB2_IP (kind:Mandatory)
Colocation Constraints:
DB2_IP with DB2_HADR-master (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
...
**For RHEL 8.x**
```
# pcs resource show DB2_HADR-clone
Clone: DB2_HADR-clone
Meta Attrs: notify=true promotable=true
Resource: DB2_HADR (class=ocf provider=heartbeat type=db2)
Attributes: instance=db2inst1 dblist=sample
Operations: start interval=0s timeout=120s (DB2_HADR-start-interval-0s)
stop interval=0s timeout=120s (DB2_HADR-stop-interval-0s)
promote interval=0s timeout=120s (DB2_HADR-promote-interval-0s)
demote interval=0s timeout=120s (DB2_HADR-demote-interval-0s)
monitor interval=20s timeout=60s (DB2_HADR-monitor-interval-20)
monitor interval=22s role=Master timeout=60s (DB2_HADR-monitor-interval-22)
notify interval=0s timeout=10s (DB2_HADR-notify-interval-0s)
```
```
# pcs constraint config
...
Ordering Constraints:
promote DB2_HADR-clone then start DB2_IP (kind:Mandatory)
Colocation Constraints:
DB2_IP with DB2_HADR-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
...
```
- Check if the node attributes are populated by the DB2 resource agent (at minimum the
master-DB2_HADRattribute should be present on all nodes).
# crm_mon -A1
...
Node Attributes:
* Node node1:
+ db2hadr_db2inst1_sample_fal : S0000000.LOG
+ master-DB2_HADR : 10000
* Node node2:
+ db2hadr_db2inst1_sample_fal : S0000001.LOG
+ master-DB2_HADR : 8000
...
-
Test switching the HADR roles using following commands. The second command removes the constraint that is created by the first command that makes the HADR roles to be switched.
For RHEL 6.x, 7.x
# pcs resource move DB2_HADR-master --master # pcs resource clear DB2_HADR-masterFor RHEL 8.x
# pcs resource move DB2_HADR-clone --master # pcs resource clear DB2_HADR-clone