Openstack services are slowing down while rabbitmq messages are growing up without being consumed.

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Openstack Platfrom 7.

Issue

  • In my openstack deployment, I can see that messages in certain rabbitmq queues are growing without being consumed. Some of these queues have no consumers as well. The more the messages in the queue, the more it starts slowing down operations from openstack services. Eg: Below shows below queues have more than 1000 messages.
# rabbitmqctl list_queues |  awk 'int($2)>=1000'
cinder-scheduler_fanout_9aed9fbc3d4249289f2cb5ea04c062ab        8163
cinder-scheduler_fanout_b7a2e488f3ed4e1587b959f9ac255b93        8159
cinder-scheduler_fanout_ea9c69fb630f41b2ae6120eba3cd43e0        8163
cinder-scheduler_fanout_fa01ec50173647a7b5c031230956eea9        8148

But there are no consumers for those messages.

# rabbitmqctl list_queues name messages consumers | grep cinder-scheduler
cinder-scheduler        0       3
cinder-scheduler.ha-controller  0       3
cinder-scheduler_fanout_9aed9fbc3d4249289f2cb5ea04c062ab        8145    0
cinder-scheduler_fanout_b7a2e488f3ed4e1587b959f9ac255b93        8141    0
cinder-scheduler_fanout_ea9c69fb630f41b2ae6120eba3cd43e0        8145    0
cinder-scheduler_fanout_fa01ec50173647a7b5c031230956eea9        8130    0

Why does this happen? What should I do when I come across such a problem on my openstack deployment which slows down all openstack operations that uses message queue?

Resolution

This usually happens when the consumer goes down and the client still keeps sending messages to the queue. An example is when notifications enabled to be emitted, but do not have ceilometer running in order to consume them.

A solution to this issue is to Content from www.rabbitmq.com is not included.set TTL to let the message expire after it remains for certain amount of time. We cannot configure a ttl for every queue and it depends on whether the queue is declared for auto_delete or not.

Queues Marked For auto_delete

Setting TTL for queues marked for `auto_delete` is fine. In fact, if `auto_delete` works as expected, then there is no need to set TTL as the message is expected to be consumed immediately and queue will get deleted when the consumer disconnects. If messages are growing without a consumer for a queue marked for `auto_delete`, it may indicate a bug. Please contact Red Hat support if you see this in your openstack deployment. Setting ttl for such queues to temporarily work around the bugs is acceptable. A ttl of 30 minutes would be a good value to start with.

Queues Not Marked For auto_delete


We need to be careful while setting ttl for queues that are not marked for auto_delete. These queues are not marked for auto_delete because it's expected to keep the messages in the queue when consumer goes down assuming the messages will be consumed when the consumer comes back. Eg, The message queue for a compute node's nova-compute is not marked for auto delete. This is because if nova-compute on the compute node goes down or the compute node is taken for maintenance, then the messages will remain in the queue. When the nodes comes back or nova-compute is started, it will consume all messages in the queue. If we set a 5 minute TTL for this queue and the nova-compute was down for 30 minutes, most of the messages during the time will be removed from queue. This may not always be an issue, but we need to take this into account while setting TTL for queues not marked for auto_delete.

In short, if we decide to set TTL for queues not marked for auto_delete, we must know why we are doing this and setting TTL for the queue is the right thing.

How To Identify If A Queue Is Marked For Auto_Delete?


To identify if a queue is marked for auto delete or not, run `rabbitmqctl list_queues name auto_delete consumers`. The second field against each queue will tell whether it's marked for auto_delete or not. Eg,
#rabbitmqctl list_queues name auto_delete consumers
cinder-scheduler_fanout_9aed9fbc3d4249289f2cb5ea04c062ab	true	 0
cinder-scheduler_fanout_b7a2e488f3ed4e1587b959f9ac255b93	true	 0
cinder-scheduler_fanout_ea9c69fb630f41b2ae6120eba3cd43e0	true	 0
cinder-scheduler_fanout_fa01ec50173647a7b5c031230956eea9	true	 0

How Do I Set TTL For A Queue?


Below command will set a 30 minute TTL for all `cinder-scheduler_fanout` queues. Change the command depending on which queue TTL is set.
# rabbitmqctl set_policy expire-cinder-fanout 
"^cinder-scheduler_fanout.*" '{"expires":1800000}' --apply-to queues

Note that timeout is configured in milliseconds.

SBR

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.