How can I determine maximum number of connections required to MariaDB database for an Openstack deployment?

Solution Verified - Updated 11 Jan 2019

Environment

Red Hat Enterprise Linux Openstack Platform 5.
Red Hat Enterprise Linux Openstack Platform 6.
Red Hat Enterprise Linux Openstack Platform 7.
Red Hat Enterprise Linux Openstack Platform 8.
Red Hat Enterprise Linux Openstack Platform 9.

Issue

RHEL-OSP Highly Available deployment with three controllers is using MariaDB with Galera as the database backend. We need to configure below in this environment.
- Default max_connections configured for the MariaDB Server is 4096.
- Services connect to the database via haproxy and haproxy has default limit of maximum 2000 connections for each proxy.
How can I determine if these maximum connections are enough? If they need to be increased, what should be the right value to configure?

Resolution

Openstack has a modular architecture where various services work together to provide infrastructure for application. Some of the services are keystone, Nova, Neutron, Cinder, Glance, Heat, Ceilometer, etc. Almost all of these services need access to the MariaDB database and most of the database connections are made by the API services associated with them. A default deployment by Director configures some API service to spawn number of worker processes equal to the total number of cpu cores, some x2 and some x3 and each of these worker process is capable of generating many database connections. If you are not using Director, then you may wish to configure these settings by some other means. Each worker process can create up to five long-lived database connection lasting 90 minutes. These services may also create up to 10 additional, short-lived database connections if needed. Standard Highly Available OpenStack deployments deploys minimum three controllers, each running same openstack services active/active with several worker processes. This requires that we multiply maximum number of required database connections with the number of controllers.

Number Of Worker Processes By Services

A default deployment by RHELOSP-Director configures number of worker processes for each service as below. Replace x with number of cores on the system.

Service	Number of worker processes
keystone-all	x * 2
nova-api	x * 3
nova-conductor	x * 1
glance-api	x * 1
glance-registry	x * 1
neutron-server	x * 2
cinder-api	x * 1

Note that the default configuration of worker processes for each service done by RHELOSP-Director may not be optimal for all use cases. On a system with 96 cores, nova-api may not need 96x3 api worker processes to provide API services to users. Red Hat is working on a formula to redefine number of worker processes required for each service for optimal performance and will be published once ready.

These worker processes are capable of creating anywhere from 1-15 database connections, where the first 5 connections are long-lived. In addition, the processes cinder-scheduler, cinder-volume, nova-consoleauth and nova-scheduler can also create multiple database connections.

Let us take an example of an Openstack highly available deployment with three controllers and each controller having 24 cpu cores. Total number of worker processes on each controller in this deployment would be as below.

Service	Number of worker processes	Total
keystone-all	24 * 2	48
nova-api	24 * 3	72
nova-conductor	24 * 1	24
glance-api	24 * 1	24
glance-registry	24 * 1	24
neutron-server	24 * 2	48
cinder-api	24 * 1	24

This is a total of 264 worker processes for all services on a single controller that can create multiple concurrent database connections. More generally, this is 11x the number of cores. In addition, the processes cinder-scheduler, cinder-volume, nova-consoleauth and nova-scheduler can also create multiple database connections. So there are effectively 268 worker processes that are capable of creating multiple database connections.

If we assume that each of the 264 worker process has 5 long lived database connections, we could theoretically see 1320 concurrent, long lived database connections from each of the three controllers which is a total of 3960 database connections.

We recommend that you add 1024 to the above value to accommodate short lived database connections and give room for new openstack services that may get added to this deployment in future.

Once you determine the right maximum connection for your deployment, then this need to be configured in two places.

Mysql proxy section in /etc/haproxy/haproxy.cfg of each controller. Follow steps in Why do some openstack API commands randomly time out and fail? to achieve this.
On each MariaDB/Galera node. Follow steps in How do I update the maximum number of connections to MariaDB database? for more details on this.

Diagnostic Steps

Note that the above is the maximum connections all of the Openstack services together can make to database. Actual number of connections at any given point of time can vary depending upon how overloaded is the controller nodes due to requests from clients to each openstack services. In a test where 200 virtual machine were created and immediately deleted, all while constantly polling nova with nova list --all-tenants, we saw upwards of 1500 concurrent database connections. This is less than half of the limit that we explained above and it does not account for the additional 10 short lived database connection that are available to each worker process. This is partly due to the fact that the we were not putting load on each of the OpenStack services simultaneously. For example, in the test where we created 200 virtual machines, we were not putting any load on Cinder. In a separate test where we simply created and deleted several hundred Cinder volumes, the number of database connections used by Cinder increased to several hundred across all controllers. Clearly the number of concurrent database connections that a user will see at any given time will depend on the load being placed on each service. If the test which had created 200 virtual machines had also created and attached Cinder volumes, one could expect the number of database connections to have gone up.
In one other test we saw upwards of 3200 concurrent database connections for a deployment with 3 controllers, each running on a machine with 96 core. The test used here was to deploy 960 virtual machines at once (320 per tenant across 3 tenants). The default connection limit in haproxy for any proxy is 2000 and the result is refused connections to the database. In this case, both the haproxy ‘maxconn’ limit for the mariadb (galera) proxy had to be increased as well as the mariadb ‘max_connections’ limit.
The command used to get total number of database connections for a given openstack service, across all worker processes:

lsof -i4:3306 -sTCP:ESTABLISHED -c ^haproxy -c ^mysqld +c0 | sed '1d' | awk '{ print $1 }' | uniq -c | sort -k 2

To see the number of database connections that each worker process has at any given time, use the same command but including the PID in the awk statement:

lsof -i4:3306 -sTCP:ESTABLISHED -c ^haproxy -c ^mysqld +c0 | sed '1d' | awk '{ print $1,$2 }' | uniq -c | sort -k 2

To find the number of cores on a given controller:

cat /proc/cpuinfo | egrep ^processor | wc -l

To get details of how many connections are being served by mysql proxy at any given point of time, access the haproxy statistics page on the controller which has the mysql VIP active. Follow below steps to access the statistics page. On a controller run below command.

# grep -A1 haproxy.stats /etc/haproxy/haproxy.cfg 
listen haproxy.stats
  bind 192.0.2.6:1993

Access the ip:port show after bind on a browser and see details of current connections at Current Connection Rate.

To find out which controller has the mysql VIP active, first find out mysql VIP.

# grep -A 1 mysql /etc/haproxy/haproxy.cfg 
listen mysql
  bind 10.74.137.11:3306

Then find out on which node pacemaker has started this VIP.

# pcs status | grep 10.74.137.11
 ip-10.74.137.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2

This says, this ip is currently active on overcloud-controller-2 and haproxy statistics page on this node need to be explored.

If you find out the active haproxy connections for mysql or current mysql connections on a node is higher than the connections obtained by applying the formula, please contact Red Hat support. We would like to investigate from where more connections are coming to Mariadb.
The max_connections obtained by applying the above formula should be less than half of the kernel thread-max if the MariaDB server is running other openstack services as well. kernel thread-max on a system is dynamically configured depending upon the hardware specification like memory, cpu cores, etc. If max_connections is more than half of the kernel thread-max, it may indicates that the system may not have enough resources to run that may worker threads. This is especially important if controllers are virtual machines where the vm is assigned high number of vcpus and very low memory. In this case, you might have to assign more memory to the vm.

To find out threads-max for a system, run below command.

# cat /proc/sys/kernel/threads-max

Haproxy has another maximum 10000 connections configured in the global section which is the maximum connection limit all proxies can together make. If you determine mysql proxy itself need high number of connections which is close to that limit, please consider increasing maxconn in global section to a higher value.

SBR

Stack

Product(s)

Red Hat OpenStack Platform

Category

Performance tune

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.