Why do some openstack API commands randomly time out and fail?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Openstack Platform 5.
  • Red Hat Enterprise Linux Openstack Platform 6.
  • Red Hat Enterprise Linux Openstack Platform 7.

Issue

  • Openstack API commands are randomly timing out with different error messages. One example is given below.
$ nova list
ERROR (ConnectionRefused): Unable to establish connection to http://192.168.1.1:35357/v2.0/tokens
  • This can cause various Openstack tasks, especially when the system is overloaded, to randomly fail like creation of network, instance, volumes, heat templates, etc. The error messages can be different in different situations and depends on which API is accessed.

  • This happens when HA controllers are used and MariaDB database is accessed by various openstack API services through a VIP under haproxy.

Resolution

Increase maxconn for mysql proxy from default 2000 to a higher value. To find out correct value for your openstack deployment, follow steps in article How can I determine maximum number of connections required to MariaDB database for an Openstack deployment?.

  • Eg, If you have decided 4096 is the right value for your deployment, add maxconn 4096 to mysql proxy. The modified section in /etc/haproxy/haproxy.cfg would look like as below.
listen mysql
  bind 192.168.124.21:3306 
  maxconn 4096
  option httpchk
  stick on dst
  stick-table type ip size 1000
  timeout client 90m
  timeout server 90m
  server overcloud-controller-0 192.168.124.23:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
  server overcloud-controller-1 192.168.124.24:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
  server overcloud-controller-2 192.168.124.25:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
  • In Red Hat OpenStack Platform 10, these values can be set with Director. Please see this solution article for further details.

Root Cause

This is because of the proxy that serves the MariaDB connection has reached maximum connection. Though haproxy is configured to allow maxconn 10000 for all proxies together, there is a default maxconn of 2000 for each proxy. If the specific proxy used for mysql reaches 2000 limit, it will drop further connections to database and client would not retry which causes API timeout and subsequent command to fail.

To get details of how many connections are being served by a proxy at any given point of time, access the haproxy statistics page on the controller which has the mysql VIP active. Follow below steps to access the statistics page. On a controller run below command.

# grep -A1 haproxy.stats /etc/haproxy/haproxy.cfg 
listen haproxy.stats
  bind 192.0.2.6:1993 

Access the ip:port show after bind on a browser and see details of current connections at Current Connection Rate.

To find out which controller has the mysql VIP, first find out mysql VIP.

# grep -A 1 mysql /etc/haproxy/haproxy.cfg 
listen mysql
  bind 10.74.137.11:3306 

Then find out on which node, pacemaker has started this VIP.

# pcs status | grep 10.74.137.11
 ip-10.74.137.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2

This says, this ip is currently active on overcloud-controller-2 and haproxy statistics page on this node need to be explored.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.