oslo.messaging holds connections when replies fail

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux OpenStack Platform 5.
  • Red Hat Enterprise Linux OpenStack Platform 6.
  • Red Hat Enterprise Linux OpenStack Platform 7.
  • Red Hat OpenStack Platform 8.

Issue

  • If client lost connectivity to RabbitMQ while RabbitMQ reports are fine, then system operator has to check if /var/log/nova/nova-conductor.log which contains line(s) similar to the following ones:
2016-09-01 16:29:00.312 43667 TRACE oslo_messaging.rpc.dispatcher NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_8289518ccb6f479caf7638306f7034f0' in vhost '/'
...
2016-09-01 16:29:05.113 43649 TRACE oslo_messaging.rpc.dispatcher NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_908b67a0fc53426382556dc247d71cad' in vhost '/'
  • if found then system operator has to check /var/log/rabbitmq/rabbit@.log for the following lines:
=ERROR REPORT==== 1-Sep-2016::16:29:05 ===
connection <0.9523.0>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_be26e71e75ad450e925360ccbebed0e3' in vhost '/'",
            'exchange.declare'}

=ERROR REPORT==== 1-Sep-2016::16:29:05 ===
connection <0.12407.0>, channel 2 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_9d243e5fc03c4c229061f4ab43c52fbe' in vhost '/'",
            'exchange.declare'}

=ERROR REPORT==== 1-Sep-2016::16:29:05 ===
connection <0.1834.0>, channel 1 - soft error:
{amqp_error,not_found,
            "no exchange 'reply_6c42356adc0a4894914e2957afacde70' in vhost '/'",
            'exchange.declare'}

Resolution

The problem is basically that a client (oslo.messaging) cannot handle gracefully channel errors (connectivity issues or AMQP errors), so it got stuck for a long time (~15 minutes or even more). This was fixed upstream in December, 2015, and we rolled updates during 2016.01 - 2016.04 (depending on a branch) as fixed at python-oslo-messaging packages as following:

*For Red Hat Enterprise Linux OpenStack Platform 5.0 (Icehouse)for RHEL 6 covered by This content is not included.BZ#1304080
resolved by RHBA-2016-0434.

Root Cause

Restarting a rpc client can lead to a connection starvation on the connection

Step that lead to this issue:

  • The rpc client sends a bunch of message (> of the connection pool size, 30 by default)
  • The rpc server receives all this messages and process it (but don't sent yet the reply)
  • The rpc client application is restarted
  • The rpc server tries to replies to all messages received before the restart
    • here the reply queue doesn't exists anymore
    • for each messages that need to be replies we wait 60 seconds (in case of this is due to a rabbit restart)
  • In the meantime the new rpc client try to send message and expected reply,
    but the rpc server is waiting to old rpc client to come back.
    Here we got a ton of RPC timeout until the rpc server finished to process its replies messages

pool on the rpc server side.

Step that lead to this issue:

  • The rpc client sends a bunch of message (> of the connection pool size, 30 by default)
  • The rpc server receives all this messages and process it (but don't sent yet the reply)
  • The rpc client application is restarted
  • The rpc server tries to replies to all messages received before the restart
    • here the reply queue doesn't exists anymore
    • for each messages that need to be replies we wait 60 seconds (in case of this is due to a rabbit restart)
  • In the meantime the new rpc client try to send message and expected reply,
    but the rpc server is waiting to old rpc client to come back.
    Here we got a ton of RPC timeout until the rpc server finished to process its replies messages

This already incorporated a patch with the python-oslo-messaging-1.8.3-5.el7ost and similars - check the resolution section for confirming the errata for your specific Openstack platform version.

Covering the following upstream ones
Content from review.openstack.org is not included.Related-Upstream-Kilo
Content from review.openstack.org is not included.Related-Upstream-Kilo
Content from review.openstack.org is not included.Related-Upstream-Liberty

You can get the package from errata described at the resolution section.

After the package update, affected rabbitmq queues may need to be deleted and recreated.

Diagnostic Steps

Here the classic not "NotFound: Exchange.declare: (404) NOT_FOUND - no exchange ':

 42746 2016-06-30 05:59:19.437 25684 ERROR oslo_messaging._drivers.common [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/        oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n    executor_callback))\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 74, in reply\n    s        elf._send_reply(conn, reply, failure, log_failure=log_failure)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 63, in _send_reply\n    conn.direct_send(self.rep        ly_q, rpc_common.serialize_msg(msg))\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1161, in direct_send\n    self.publisher_send(p, msg)\n', '  File "/usr/li        b/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1129, in publisher_send\n    self.ensure(_publish, retry=retry, error_callback=_error_callback)\n', '  File "/usr/lib/python2.7/sit        e-packages/oslo_messaging/_drivers/impl_rabbit.py", line 879, in ensure\n    ret, channel = autoretry_method()\n', '  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured\n            return fun(*args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__\n    return fun(*args, channel=channels[0], **kwargs), channels[0]\n', '  File "/u        sr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 865, in execute_method\n    method()\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", li        ne 1126, in _publish\n    publisher.send(self, msg, timeout)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 469, in send\n    timeout)\n', '  File "/usr/lib/p        ython2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 398, in send\n    routing_key=self.routing_key)\n', '  File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 85, in __init_        _\n    self.revive(self._channel)\n', '  File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 218, in revive\n    self.declare()\n', '  File "/usr/lib/python2.7/site-packages/kombu/messaging.        py", line 105, in declare\n    self.exchange.declare()\n', '  File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 166, in declare\n    nowait=nowait, passive=passive,\n', '  File "/usr/lib/pyth        on2.7/site-packages/amqp/channel.py", line 620, in exchange_declare\n    (40, 11),  # Channel.exchange_declare_ok\n', '  File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait        \n    return self.dispatch_method(method_sig, args, content)\n', '  File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 87, in dispatch_method\n    return amqp_method(self, args)\n', '          File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 241, in _close\n    reply_code, reply_text, (class_id, method_id), ChannelError,\n', "NotFound: Exchange.declare: (404) NOT_FOUND - no exchange 'reply_d487caa2d6584e2399e9c31ad83d77f2' in vhost '/'\n"]

Here retrying as the exchange is not created yet

  42747 2016-06-30 05:59:19.440 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42748 2016-06-30 05:59:20.442 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42749 2016-06-30 05:59:21.446 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...

Here the channels errors providing a Error connection refused.

  42750 2016-06-30 05:59:21.509 25683 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42751 2016-06-30 05:59:22.382 25686 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED

and repetition

  42752 2016-06-30 05:59:22.452 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42753 2016-06-30 05:59:22.494 25691 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42754 2016-06-30 05:59:23.427 25680 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42755 2016-06-30 05:59:23.454 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
42756 2016-06-30 05:59:23.947 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42757 2016-06-30 05:59:23.947 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42758 2016-06-30 05:59:23.948 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42759 2016-06-30 05:59:23.948 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42760 2016-06-30 05:59:23.948 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42761 2016-06-30 05:59:23.990 25694 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42762 2016-06-30 05:59:24.425 25679 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42763 2016-06-30 05:59:24.461 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42764 2016-06-30 05:59:24.462 25681 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42765 2016-06-30 05:59:24.492 25690 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42766 2016-06-30 05:59:24.492 25690 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42767 2016-06-30 05:59:25.466 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42768 2016-06-30 05:59:25.481 25698 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42769 2016-06-30 05:59:26.150 25697 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42770 2016-06-30 05:59:26.161 25689 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42771 2016-06-30 05:59:26.469 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42772 2016-06-30 05:59:26.750 25685 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42773 2016-06-30 05:59:27.471 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42774 2016-06-30 05:59:28.222 25679 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42775 2016-06-30 05:59:28.475 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42776 2016-06-30 05:59:29.479 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42777 2016-06-30 05:59:29.605 25683 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42778 2016-06-30 05:59:30.482 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42779 2016-06-30 05:59:31.341 25688 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42780 2016-06-30 05:59:31.342 25688 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42781 2016-06-30 05:59:31.485 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42782 2016-06-30 05:59:32.108 25678 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42783 2016-06-30 05:59:32.487 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42784 2016-06-30 05:59:33.492 25684 INFO oslo_messaging._drivers.impl_rabbit [req-6d809610-b8d2-4828-a5f0-e831da6249a7 - - - - -] The exchange Exchange reply_d487caa2d6584e2399e9c31ad83d77f2(direct) to send to         reply_d487caa2d6584e2399e9c31ad83d77f2 doesn't exist yet, retrying...
  42785 2016-06-30 05:59:47.937 25693 WARNING nova.openstack.common.loopingcall [req-8c2fa09b-2fa6-4622-8fb9-935ab738f379 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 8.93 sec
  42786 2016-06-30 05:59:52.684 25676 WARNING nova.openstack.common.loopingcall [req-781db276-714b-4465-b03a-46c776b7f7e7 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 13.67 sec
  42787 2016-06-30 05:59:54.420 25685 WARNING nova.openstack.common.loopingcall [req-22e5b3c1-6659-4af9-87fb-a8063dd18a16 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.22 sec
  42788 2016-06-30 05:59:54.421 25685 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 172.24.13.12:5672
  42789 2016-06-30 05:59:57.259 25699 WARNING nova.openstack.common.loopingcall [req-89d0b9a9-4853-4e0c-85dd-d1570fbf6066 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 16.34 sec
  42790 2016-06-30 05:59:59.452 25681 WARNING nova.openstack.common.loopingcall [req-7c56cba4-aa29-4476-ae84-1392a68c89a0 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 18.47 sec
  42791 2016-06-30 05:59:59.454 25681 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED

  42792 2016-06-30 06:00:00.394 25689 WARNING nova.openstack.common.loopingcall [req-cd21e87f-b2fc-47d3-b4fd-dc975a76b0b4 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 19.01 sec
  42793 2016-06-30 06:00:00.395 25689 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42794 2016-06-30 06:00:00.602 25697 WARNING nova.openstack.common.loopingcall [req-56c757a6-dc92-4d0f-a67b-639b02902614 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 17.86 sec
  42795 2016-06-30 06:00:00.603 25697 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42796 2016-06-30 06:00:00.757 25677 WARNING nova.openstack.common.loopingcall [req-30f8c53e-b237-4494-840e-d48e79bbf5be - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 17.07 sec
  42797 2016-06-30 06:00:00.857 25684 WARNING nova.openstack.common.loopingcall [req-3532ed4b-9c09-4bdb-9590-d6bca09e5a30 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 16.72 sec
  42798 2016-06-30 06:00:00.974 25690 WARNING nova.openstack.common.loopingcall [req-364bf6ad-0f3e-40bb-a123-a30d8812b775 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.76 sec
  42799 2016-06-30 06:00:00.975 25690 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42800 2016-06-30 06:00:00.975 25690 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42801 2016-06-30 06:00:01.030 25694 WARNING nova.openstack.common.loopingcall [req-e9b71dde-9b4c-4d38-8225-cb6b35b3c5cb - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.28 sec
  42802 2016-06-30 06:00:01.038 25694 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42803 2016-06-30 06:00:01.133 25692 WARNING nova.openstack.common.loopingcall [req-ffaf284d-e2f2-447c-adba-22fbe39f20ed - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.34 sec
  42804 2016-06-30 06:00:01.241 25683 WARNING nova.openstack.common.loopingcall [req-3ba54a08-66b0-48bc-8fb8-008893c0e5b2 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.40 sec
  42805 2016-06-30 06:00:01.242 25683 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42806 2016-06-30 06:00:01.242 25683 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42807 2016-06-30 06:00:01.349 25679 WARNING nova.openstack.common.loopingcall [req-a0bc34a2-97b5-43f0-a920-ed1929919399 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.41 sec
  42808 2016-06-30 06:00:01.350 25679 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42809 2016-06-30 06:00:01.350 25679 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42810 2016-06-30 06:00:01.428 25698 WARNING nova.openstack.common.loopingcall [req-7fc3faa7-1a03-4171-ad81-efe0744b0ae4 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 15.19 sec
  42811 2016-06-30 06:00:01.429 25698 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42812 2016-06-30 06:00:01.433 25691 WARNING nova.openstack.common.loopingcall [req-a2e74018-223b-4231-9e1a-80173719e284 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 14.90 sec
  42813 2016-06-30 06:00:01.433 25691 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42814 2016-06-30 06:00:01.438 25695 WARNING nova.openstack.common.loopingcall [req-3f38d611-c7e8-490a-85cb-1e1d6dc2fd9f - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 14.83 sec
  42815 2016-06-30 06:00:01.458 25686 WARNING nova.openstack.common.loopingcall [req-c9a5fc19-eda1-47fc-90dd-949dc9a19feb - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 14.62 sec
  42816 2016-06-30 06:00:01.459 25686 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42817 2016-06-30 06:00:01.526 25696 WARNING nova.openstack.common.loopingcall [req-ed8bbcf2-23c1-42cd-8bd8-2fb78ad4e90b - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 14.24 sec
  42818 2016-06-30 06:00:01.545 25680 WARNING nova.openstack.common.loopingcall [req-4b8b7699-bb7c-498e-8278-43facdb5f1ba - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 13.85 sec
  42819 2016-06-30 06:00:01.545 25680 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42820 2016-06-30 06:00:01.601 25687 WARNING nova.openstack.common.loopingcall [req-e6db580e-310b-41d6-ae9f-caec5a13b611 - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbD        river object at 0x3d48b50>> run outlasted interval by 14.06 sec
  42821 2016-06-30 06:00:01.601 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42822 2016-06-30 06:00:01.603 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42823 2016-06-30 06:00:01.604 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42824 2016-06-30 06:00:01.604 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED
  42825 2016-06-30 06:00:01.605 25687 INFO oslo_messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 111] ECONNREFUSED

at his point is a good idea checking the the default maxconn from haproxy in general for the kinds of _report_state of api related ones as

 42826 2016-06-30 06:00:01.666 25678 WARNING nova.openstack.common.loopingcall [req-fb4db934-b067-4b89-8f5a-8333a42e954e - - - - -] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x3d48b50>> run outlasted interval by 13.54 sec
SBR

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.