Performance tuning the backend database for Red Hat Enterprise Linux OpenStack Platform
Red Hat Enterprise Linux OpenStack Platform uses a MariaDB database in the OpenStack control plane for data persistence. This database acts as a system back end, storing configuration and historical data for certain OpenStack services, including Compute, Identity Service, and Block Storage. This article discusses the available options for maintaining and performance tuning this database.
Sizing the back end database
An analysis of growth rates and transaction latencies in Red Hat Enterprise Linux OpenStack Platform 4 and 5 reveals that Compute (Nova) and Block Storage (Cinder) are growth databases. They grow as a result of soft deletes, but even when theoretically managing 1,000,000 instances it will still remain under 10GB in size.
Note: The command nova-manage db archive_deleted_rows is currently deprecated and non-functioning as a result of constraint issues.
Performance tuning
The RHEL OpenStack PlatformInstaller configures the back end database for optimal performance.
Review the database cache
- The installers (Packstack and RHEL OpenStack Platform Installer) will configure 20% of available RAM for the database cache. DIO is enabled to bypass the Linux page cache and avoid double buffering.
Consider SSD disk placement
- Moving the MariaDB database to a SSD drive could reduce disk access latency.
Galera Performance
When using Galera you should consider increasing the innodb_log_file_size option. Be aware that large log files will increase the duration of the log recovery time, although this may be superfluous in clustered environments. The recommended size for Galera deployments is 2x512M. This example configures 2x2G.
innodb_log_file_size=2G
innodb_log_files_in_group=2
Other Galera/MariaDB parameters
What's the recommendation for usage of the following parameters?
innodb_fast_shutdown=0
This is not necessary, leave this at its default of 1.
innodb_buffer_pool_instances=16
This depends on what innodb_buffer_pool_size is set towards. The recommended formula suggests that for 16 instances you'd want innodb_buffer_pool_size to be 16G. However a very larger innodb_buffer_pool_size is not generally necessary for OpenStack-style loads, and can also be harmful in a stock Director deploy because the database processes run on the same host as many other services. Red Hat does support raising the innodb_buffer_pool_size value higher than the
default of 134M which is very low, more like 1-2G is probably better.
innodb_log_buffer_size=64M
This might be useful for OpenStack in that some of the Director installer style programs might create very large transactions with logging output, but for normal use OpenStack doesn't have very large transactions, it has lots of little ones when something like telemetry is in use, and if that isn't on, it has even less.
innodb_read_io_threads=16
innodb_write_io_threads=16
These are unlikely to be useful on a Linux system , and MariaDB's docs specifically mention that these might need to be lowered from their default of 4 on a system that has other things going on.
Note: Galera write performance only exceeds that of single-node MariaDB when you are CPU limited. You may not see the performance difference if your deployment is limited by TCP request response latency. However, the multi-writer performance can be worse on loads with a high number of write operations. This would apply across nodes, and so would be three times worse when using three nodes.
-
Ensure that innodb_file_per_table has been enabled. There is no performance benefit, however it does make long term management easier. This setting is not recommended for databases with many thousands of small tables, however it is suitable for the OpenStack database.
-
Auto-increment - No changes should be made to the auto-increment settings, as OpenStack applications rely upon its presence for INSERT operations. In addition, the wsrep_auto_increment_control parameter is set so that Galera maintains the auto-increment number across the cluster as a whole. As a result, the setting stays consistent and should always be increasing. Note that all OpenStack applications are directed to just one node, so there should be no exposure to auto-increment synchronization issues.
-
Configuring thread-pools can offer a performance benefit in use cases with thousands of lightly active connections.
MariaDB configuration
Review the settings in /etc/my.cnf:
- innodb_buffer_pool_size - This should be 10-20% of available memory.
- innodb_flush_method - Confirm that this value is
O_DIRECT.
For example:
max_connections = 15360
innodb_buffer_pool_size = 10G
innodb_flush_method = O_DIRECT
Also refer to https://access.redhat.com/solutions/3327611
Limits
Review the maximum number of file descriptors in /etc/systemd/system/mariadb.service.d/limits.conf. For example:
[Service]
LimitNOFILE = 16384
Hardware considerations
Review these options when sizing a server to host the back end database:
- Have a fast disk subsystem. This should at least exceed that of a single rotating HDD.
- At least 32GB memory.
Database maintenance plan
Consider the following when forming a database management plan:
Backup strategy
-
In a 3-node Galera deployment it is possible to stop and remove one node from the cluster and create a cold backup by copying the MariaDB files. This type of backup may not be optimal for a Galera active/passive deployment.
-
Create a hot backup using mysqldump. This is an easier and improved option over a cold backup, and can be performed on any master node that is not in a
non-primarystate. It is recommended that mysqldump be run without locking tables, using the--single-transactionflag. This is considered safe for the InnoDB tables used by galera, and will prevent the database from pausing while the backup takes place. In the case of MariaDB, this type of backup can usually be restored to a different MariaDB version.
Items to monitor
-
Table, database, and tablespace growth rates.
-
Long running transactions, and long lock waits. By default, MariaDB kills long operations, but in an ERP system it is commonly a manual task. The advanced lock-wait monitor usually also seeks out the competing transaction and logs it.
-
Active/inactive connections.
-
Slow queries - Review the slow query log to identify under-performing queries. In particular, Compute (nova) separates its operations into multiple small transactions. This activity may be considered inefficient and is currently under review. For more information, refer to this article.
Purge unnecessary data
OpenStack services perform soft deletes, meaning that rows intended for removal are flagged as deleted, instead of actually being removed. As a result, the corresponding records remain in the database.
For example, after around a year of usage on a CI system, 98% of data in the Compute database can be considered superfluous. This information can be useful for debugging issues in production environments.
Note: Identity Service (keystone) tokens are now flushed automatically using a scheduled cron job.
Statistics
The OpenStack services do not maintain statistics in the back end database. However, a deployment using OpenStack Telemetry (ceilometer) and MongoDB would be able to produce similar results.