[Satellite6] Deleting a content host, pulp raises error "Value for replyText is too large"
Environment
- Red Hat Satellite 6
Issue
- Having over 1900 content hosts
- Trying to apply some action involving
pulp- One known scenario is deleting a content host (while creating a new one usually works)
- Another scenario includes restart of
pulpservices (see this solution)
- Such a task fails with
500 Internal Server Errorand pulp traceback withValue for replyText is too largeerror - Another scenario is creating a new content host (under some circumstances) that fails on
io_queue_initcall ofqpidbroker returningEAGAIN
Resolution
Increase the maximal number of allowable concurrent AIO requests by increasing kernel parameter fs.aio-max-nr. Add to /etc/sysctl.conf:
fs.aio-max-nr=655360
and run sysctl -p to load the tuning.
Use whatever value bigger than 33 times maximal number of content hosts planned to be registered to Satellite any time. Please be aware of queue leak bug in Satellite described in this solution.
Learn More
See the [Red Hat Satellite Installation guide](https://access.redhat.com/documentation/en-US/Red_Hat_Satellite/6.1/html/Installation_Guide/sect-Red_Hat_Satellite-Installation_Guide-Prerequisites.html#sect-Red_Hat_Satellite-Installation_Guide-Prerequisites-Large_deployments) for more on considerations for large deployments.
Root Cause
Due to This content is not included.this bug, qpid broker can handle up to approx. 1980 durable queues until default limit of concurrent AIO requests (configurable via fs.aio-max-nr parameter) is reached. Since Satellite 6 requires one such queue for one content host, the limit is reached when having approx. 1980 content hosts.
For more hosts, any user activity causing pulp to create a new durable queue in qpid broker fails - including an attempt to delete a content host. For the first time, qpidd logs the AIO error and is about to send that error text back to pulp. Since the error text is longer than 256 characters (maximum that AMQP0-10 protocol allows in replyText field), the broker - instead of rejecting the pulp request - raises connection error "Value for replyText is too large" followed by AMQP session detach. Pulp can not recover from this error, raising the uncaught exception. That triggers the "500 Internal Server Error" in Satellite as well.
As one durable queue requires 33 AIO requests, default fs.aio-max-nr=65536 is deplenished by approximatelly 1980 content hosts. Ten-folding the kernel value allows to have up to 19800 content hosts, what should meet (almost) all deployment requirements.
Diagnostic Steps
/var/log/messages sometimes having qpidd error pmgr::initialize() threw JERR__AIO: AIO error and always having pulp error illegal-argument: Value for replyText is too large(541) followe by long traceback (skipped here):
Apr 25 03:46:42 mysatellite qpidd[403]: 2015-04-25 03:46:42 [Broker] error Connection exception: framing-error: Queue pulp.agent.905b0d44-0627-4847-89c5-cff987cc9d29: create() failed: jexception 0x0103 pmgr::initialize() threw JERR__AIO: AIO error. (io_queue_init() failed: errno=11 (Resource temporarily unavailable)) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:421)
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: illegal-argument: Value for replyText is too large(541)
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: Traceback (most recent call last):
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: File "/usr/lib/python2.7/site-packages/pulp/server/webservices/middleware/exception.py", line 44, in __call__
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: return self.app(environ, start_response)
..
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: self.check_error()
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 212, in check_error
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: raise e
Apr 25 03:46:42 mysatellite pulp: pulp.server.webservices.middleware.exception:ERROR: ConnectionError: illegal-argument: Value for replyText is too large(541)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.