How can I verify my OpenStack environment is deployed with Red Hat recommended configurations?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux OpenStack Platform 5
  • Red Hat Enterprise Linux OpenStack Platform 6
  • Red Hat Enterprise Linux OpenStack Platform 7
  • Red Hat OpenStack Platform 8
  • Red Hat OpenStack Platform 9
  • Red Hat OpenStack Platform 10
  • Red Hat OpenStack Platform 11
  • Red Hat OpenStack Platform 12
  • Red Hat OpenStack Platform 13
  • Red Hat OpenStack Platform 14
  • Red Hat OpenStack Platform 15
  • Red Hat OpenStack Platform 16

Issue

  • I have deployed an OpenStack environment. How can I validate this environment in order to:
    • verify my deployment conforms to Red Hat recommended configurations?
    • confirm my deployment has workarounds or solutions applied for known problems?
  • Are there any check lists that I can go through to verify whether this environment will hit any known problems in the future?
  • What are the best practices for configuring OpenStack services to avoid problems?
  • What are common problems I can avoid in an OpenStack environment?

Resolution

This is not a comprehensive list, but is intended to help you check whether your RHEL-OSP configuration follows Red Hat recommended configurations and to verify whether you need to implement any workarounds to address known issues. Most of the solutions in this checklist are applied by default when using our deployment tools -- rhel-osp-installer for RHEL-OSP5 and RHEL-OSP6 and OSP-Director for RHEL-OSP7. It is possible, however, that you have deployed using a version of the installer released prior to the incorporation of these fixes, or that you have used another deployment tool such as Ansible that may not account for these changes. Running through this check list is highly recommended and will help you to discover any gaps in your installation based on our top Solutions for RHEL-OSP.

Determine Right Maximum Connections For MariaDB/Galera


RHEL-OSP deployments usually use MariaDB with Galera as the database back end for all required services. Depending upon how many controllers are there and how many cores are there on each controller, maximum number of connections to MariaDB can vary from deployment to deployment. Follow steps in article [How can I determine maximum number of connections required to MariaDB database for an OpenStack deployment?](https://access.redhat.com/solutions/1990433) to determine correct max_connections for MariaDB. Once the maximum_connections is determined, it should be configured in two places.
1. Configure HAProxy Maximum Connections to MariaDB/Galera

It's possible that the proxy that serves the MariaDB connection can reach an invisible default maximum connections. Though HAProxy is configured to allow maxconn 10000 for all proxies together, there is a default maxconn of 2000 for each individual proxy. Since each service will open a connection to the database, the specific proxy `mysql` can hit this 2000 limit. When this limit is hit, it will drop further connections to the database and the client will not retry, causing API timeouts and subsequent commands to fail.

To address this, follow the steps in this document:
Why do some OpenStack API commands randomly time out and fail?

2. Configure Maximum Connections on each MariaDB/Galera server

Maximum Connections to database need to be configured on each MariaDB/Galera node. Follow steps in [How do I update the maximum number of connections to MariaDB database?](https://access.redhat.com/solutions/1594113) to configure this.

MariaDB/Galera File Descriptor Limit

The default file descriptor limit for MariaDB/Galera is too low for an OpenStack deployment. When the database server hits this limit, it cannot open new file descriptors to process additional database connections and will refuse them from various OpenStack services causing failure of the associated OpenStack tasks. It is recommended that you increase this limit to 16384.

For instructions on how to increase this limit, see this document:
This content is not included.How can I increase default file descriptor limit for MariaDB/Galera service in an RHEL-OSP environment?

Cluster Management Networks Should Not Use DHCP for IP Assignments

DHCP is unsupported for cluster management networks that carry corosync traffic. If configured via DHCP, HA Controller nodes in the RHEL-OSP deployment may get fenced (rebooted) by the cluster software.

Red Hat recommends reconfiguring the cluster interconnect network to use static IP addressing to avoid nodes being fenced unexpectedly, per this document:
RHEL-OSP HA controllers are getting randomly fenced. How can I resolve this?

Connecting OpenStack instances directly to an external network with Neutron

This document cannot be used to configure Neutron from scratch to support provider networks. This document expects that you have an OpenStack environment deployed by This content is not included.Packstack or This content is not included.rhel-osp-installer or This content is not included.manually setup. This document can be used to add support for provider networks in such an environment.

Connecting OpenStack instances directly to an external network with Neutron

Network Communications From Instances to External Networks Fail When Using Flat/VLAN Provider Networks on HP Blades With Emulex NICs

Network communication from instances to the gateway of the provider network or to other instances in the same network that run on a different compute node may fail when using HP blades with Emulex NICs. Instances may also fail to get IP assignments from the Neutron DHCP server for the provider network.

To resolve this known issue, please follow the steps in this document:
Why network communication from instance to external network always fails using flat/vlan provider networks on HP blades using Emulex NIC?

Adding multiple external networks for floating ips in OpenStack


Assuming communication to first external network is via eth0 and second external network via eth1, you should have two external bridges configured and interfaces added to them upfront. To read more, see [Is it possible to add multiple external networks for floating ips in OpenStack?](https://access.redhat.com/solutions/728613).

Nova Requires NoopFirewallDriver When Neutron is in Use

Neutron is responsible for managing security group rules when Neutron networking is used. At this time the firewall_driver in nova.conf is set to use NoopFirewallDriver. If this is wrongly configured, then both Nova and Neutron will try to manage security group rules, which will create conflicts.

To verify your settings, follow the steps in this document:
When using OpenStack Nova and Neutron Networking, virtual machines are intermittently unable to connect to the network

Configuring HAProxy Correctly for All Services

HAProxy is used to load balance network requests to various OpenStack and non-OpenStack services. It is important to configure HAProxy properly for different services to avoid scalability limitations and other operational issues.

For more details on how this can be achieved, please refer to this document:
How can I verify my haproxy.cfg is correctly configured to load balance OpenStack services?

Expired Keystone Tokens Should Be Flushed Periodically


Keystone tokens are issued with a validity configured in /etc/keystone/keystone.conf and will be expired after this period. Tokens are usually kept in `token` table in `keystone` database. Once the token validity expires, they would still remain in the database forever. This will cause multiple issues like MariaDB requiring more storage, slowing down database query time significantly which will cause timeout for api request for OpenStack services among other things. To overcome this, it's recommended to flush the keystone tokens periodically by following steps in [Various OpenStack commands fail due to timeout while connecting to keystone for authentication](https://access.redhat.com/solutions/968883)

nf_conntrack: table full, dropping packet


High traffic Red Hat OpenStack Platform networks may cause timeouts (specifically DNS timeouts) due to low maximum for netfilter connections tracking. To resolve the timeout issues, increases the 'nf_conntrack_max' kernel parameter to 500000: [Packet drops when using ip_conntrack or nf_conntrack, logs say 'ip_conntrack: table full, dropping packet.' or 'nf_conntrack: table full, dropping packet'](https://access.redhat.com/solutions/8721)

Setting time_to_live for ceilometer meter data in mongodb


By default time_to_live in /etc/ceilometer.conf is set to -1 which means no meter data is expired. It's a good idea to tune this value so that meter data is deleted after a certain time period. If the meter data keeps growing it could fill up /var/lib/mongodb/ on the root disk. If using OpenStack Director installer the value can be set on deploy by following this kcs solution: [How to set ceilometer expirer before OpenStack director deploy](https://access.redhat.com/solutions/2219091)

With default templates in OSP Director, pools will have "min_size = 1" configured as a default value , replicated Ceph OSD pool with min_size = 1 allows an object continue serving I/O when it has only 1 replica which could lead to data loss/data split-brain(incomplete PGs)/Unfound objects.
Ceph deployed by OpenStack Director has "min_size = 1" configured in pools
Master KCS for Red Hat Ceph Storage Troubleshooting and Recommended configurations

HTTPD tuning for large OpenStack environments

With more services being moved into WSCGI handlers in later versions of OpenStack, one can reach the default maximum resource limits of HTTPD. Further resource limit tuning can be applied for large environments in these cases, increasing Apache's MaxRequestWorkers.
Apache httpd Performance Issue, server reached MaxRequestWorkers
Understanding the MaxClients/MaxRequestWorkers Apache is capable of handling

Setting up STONITH and Fencing for reliable cluster resource and environment recovery

Having fencing devices configured and tested working, alongside setting STONITH to enabled in a cluster greatly increases the cluster reliability. This gives the cluster a critical tool it uses during resource and node recovery from split-brain, power loss, and other outages. Having STONITH and fencing configured is also a support requirement for Red Hat OpenStack Platform.
How to set stonith-enabled to true in a RHEL6 or 7 Pacemaker cluster

Best Practices for Swift when deployed by TripleO(Director)

By default, a Director installation will setup Swift using the root disks of the controller nodes. This is usually sufficient for small usage of Swift, however can be a very limiting architecture for high usage Swift use cases. Explore the best practices for deploying Swift via Director in this article:
Best practices for Swift deployed by TripleO

Galera database tuning for large OpenStack environments

One may want to further tune the default galera environment when large overcloud node counts are present, or a cloud see's high tenant usage. There's several options to increase the performance of galera for these use cases:
Performance tuning the backend database for Red Hat Enterprise Linux OpenStack Platform

RabbitMQ tuning for large OpenStack environments

One may want to further tune or monior the default RabbitMQ whenvironment when large overcloud node counts are present, or a cloud see's high tenant usage. There's several options to increase the performance of RabbitMQ for these use cases, and/or also monitor the RabbitMQ environent for trend analysis:
Performance Tuning for RabbitMQ in Red Hat Enterprise Linux OpenStack Platform

Director large number of nodes scaling

When deploying or scaling up a large number of nodes with Director, there's additional tuning that can be done to make these deployments more reliable, quicker to execute, allowing a larger number of nodes to be deployed at once.
How can I verify OpenStack Director's configuration for large number of nodes scaling?

There are many settings that can be tweaked on the compute nodes and in Windows that will help performance, especially for multiple instances of Windows.
Best practices for Windows instances on OpenStack

Backing up the overcloud and undercloud

It is recommended to take backups of the undercloud and the overcloud databases, especially before a fast forward upgrade or a minor update. There are different ways on how to do this depending on the OpenStack version. Starting from Red Hat OpenStack 16.0, it is now possible to schedule automatics backups with the Relax-and-Recover (ReaR) tool.
How to backup and restore OpenStack
This content is not included.How to install and configure Relax-and-Recover (ReaR) on the undercloud and overcloud control plane nodes

For more information on OpenStack Best Practices and Common Issues by release, please see the following links:

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.