Performance Tuning for RabbitMQ in Red Hat Enterprise Linux OpenStack Platform

Updated

Please note: This is not applicable from Red Hat OpenStack Platform 13 onward.

RabbitMQ Monitoring

You can monitor Rabbit queues and the connections message rate in real-time using the rabbitmq_management plugin.

  1. First, if these executables are not already present, create a symlink for the existing RabbitMQ management commands to /usr/sbin/.

     ln -sf /usr/lib/rabbitmq/lib/rabbitmq_server-3.1.5/sbin/rabbitmq-plugins /usr/sbin/rabbitmq-plugins
     ln -sf /usr/lib/rabbitmq/lib/rabbitmq_server-3.1.5/sbin/rabbitmq-env /usr/sbin/rabbitmq-env
    
  2. Then, enable the rabbitmq_management plug-in.

     rabbitmq-plugins enable rabbitmq_management
    
  3. List the plug-ins to make sure that it's enabled.

     rabbitmq-plugins list
    
  4. Restart the RabbitMQ service.

     systemctl restart rabbitmq-server.service
    

Once the rabbitmq_management plug-in is enabled, then the RabbitMQ messaging information is available in the web UI located at: http://<server-name>:15672/. The guest user can be used to authenticate, find the credentials in /etc/rabbitmq/rabbitmq.config.


Some of the monitoring values to consider setting would be collect_statistics_interval, stats_event_max_backlog, and rates_mode if very detailed statistics at the expense of performance are desired. These can be set in their respective section in the /etc/rabbitmq/rabbit.config file. Ex:

    [
      {rabbit, [
        {collect_statistics_interval, 30000}
     ]},

More details on monitoring RabbitMQ can be found here: Content from www.rabbitmq.com is not included.RabbitMQ Management Plugin


RabbitMQ Performance Tuning

There are several configuration areas that can be tweaked to improve messaging performance. Performance improves measurably when tuning RabbitMQ and improving latency. Without performance tuning, message size increases 3-5%.

  • Put the RabbitMQ server on a different physical system than the controller node. There is too much competition for resource if both are on the same system.
  • Set the vm_memory_high_watermark parameter to 0.5, which means that the RabbitMQ system uses half of the available system memory.
  • Increase the disk free limit (disk_free_limit) to 2GB (2000000000).
  • By default, RabbitMQ uses the hostname of the machine as its message domain. Define a separate hostname for RabbitMQ and create a private 10Gb network.

RabbitMQ Reliability Tuning

There are a few configuration areas that can be checked to ensure messaging reliability.

  • When using RabbitMQ in a Pacemaker cluster, please configure STONITH and fencing. This is a must. No matter how carefully tunings are applied, the environment must be able to fence a partitioned node so service can be recovered gracefully and (mostly) automatically. STONITH and fencing are a vital tool to give a cluster. Reliability will be affected without this setup.

  • When using RabbitMQ in a Pacemaker cluster with STONITH and fencing configured, one can change cluster partition handling from pause_minority to ignore in /usr/share/openstack-tripleo-heat-templates/puppet/services/rabbitmq.yaml if it's being noticed that rabbit's preemptive actions to enter a pause recovery are causing additional recovery delay. If a partition is brief enough that no nodes are fenced (usually about ~60s), the pausing can cause more downtime than it saves. By ignoring partitions we will tolerate brief partitions better. Longer partitions will be handled via fencing. DO NOT use this setting if STONITH and fencing have not been setup. Further detail on this setting can be found in the Content from www.rabbitmq.com is not included.upstream documentation:

      rabbitmq_config_variables:
          cluster_partition_handling: 'pause_minority'
    
      rabbitmq_config_variables:
          cluster_partition_handling: 'ignore'
    
  • Increase the default TCP timeout of 5 seconds to 15 seconds in the environment configuration. This can help resiliency from network partitions, allowing a bit more time for rabbit to send a health check and recover after a network flap. In newer releases, this will be replaced by Content from review.openstack.org is not included.net_ticktime. On the controllers, change the default:

      # cat /etc/rabbitmq/rabbitmq-env.conf 
      NODE_IP_ADDRESS=<IP>
      NODE_PORT=5672
      RABBITMQ_NODENAME=rabbit@overcloud-controller-0
      RABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"
    

    To reflect 15 seconds instead of 5 seconds:

      RABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<15000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<15000:64/native>>}]"
    

RabbitMQ High CPU usage Tuning

On each controller where RabbitMQ is pretty much idling but its CPU usage is quite high, "+sbwt none" can be set in order to reduce the spinlock by a factor of 3 times :

  • Add "+sbwt none" in the RABBITMQ_SERVER_ERL_ARGS environment variable and restart rabbitmq container:
    # cat /etc/rabbitmq/rabbitmq-env.conf
    NODE_IP_ADDRESS= NODE_PORT=5672
    RABBITMQ_NODENAME=rabbit@overcloud-controller-0
    RABBITMQ_SERVER_ERL_ARGS="+K true +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<15000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<15000:64/native>>}] +sbwt none"
SBR
Category
Tags
Article Type