Repository synchronization is stuck and sync tasks are being cancelled in Red Hat Satellite 6
Environment
- Red Hat Satellite or Capsule 6.5 - 6.9
Issue
-
Pulp is removing pulp workers from Satellite and canceling their tasks:
AttributeError("'NoneType' object has no attribute 'top'") WorkerLostError('Worker exited prematurely: signal 7 (SIGBUS).')
Resolution
-
If Content View Publish\Promote , Repository Sync etc tasks are affected and same issue is being observed on Satellite server, increase the
worker_timeoutvalue on the Satellite server.# grep "worker_timeout" /etc/pulp/server.conf # satellite-installer -S satellite --foreman-proxy-content-pulp-worker-timeout 3600 --katello-pulp-worker-timeout 3600 # grep "worker_timeout" /etc/pulp/server.conf -
If Capsule Sync tasks are affected by this problem, then please increase the
worker_timeout valueon the Capsule server.# grep "worker_timeout" /etc/pulp/server.conf # satellite-installer -S capsule --foreman-proxy-content-pulp-worker-timeout 3600 # grep "worker_timeout" /etc/pulp/server.conf -
Restart pulp services on the affected satellite\capsule server(s).
# satellite-maintain service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat OR # foreman-maintain service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat OR # katello-service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat
Be aware, the above solution works for pulp-2 only but not for pulp-3 that is available since Satellite/Capsule 6.10.
For more KB articles/solutions related to Red Hat Satellite 6.x Repository Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Repository Issues.
Root Cause
- For more information please refer to: This content is not included.Bug 1692885 - Pulp scheduler cancels tasks even with a high worker_timeout specified - Worker has gone missing, removing from list of workers
Diagnostic Steps
-
Tasks are being cancelled:
Sep 10 12:15:17 satellite pulp: nectar.downloaders.threaded:INFO: Download of http://satellite.example.com/streamer/var/lib/pulp/content/rpm/gnome-settings-daemon-devel/3.26.2/9.el7/i686/e2735c40e2ffde4242d24e3bb2774373261f52e6/gnome-settings-daemon-devel-3.26.2-9.el7.i686.rpm?policy=eyJleHRlbnNpb25zIjoge30sICJyZXNvdXJjZSI6ICIvc3RyZWFtZXI vdmFyL2xpYi9wdWxwL2NvbnRlbnQvcnBtL2dub21lLXNldHRpbmdzLWRhZW1vbi1kZXZlbC8zLjI2LjIvOS5lbDcvaTY4Ni9lMjczNWM0MGUyZmZkZTQyNDJkMjRlM2JiMjc3NDM3MzI2MWY1MmU2L2dub21lLXNldHRpbmdzLWRh ZW1vbi1kZXZlbC0zLjI2LjItOS5lbDcuaTY4Ni5ycG0iLCAiZXhwaXJhdGlvbiI6IDE2MzEyNjM3NDN9;signature=LrjUfoPe1lhwBtiYONwGTtLarRmQwyVrT6-x3NfHWsXMkNh-vaYMwvrSfyevK_4Me61keWNdmNiQVcgeCG 5uEY5ZDAqCgKK--I6qXh_PxeDg-7-i-0SbtVu4yRHafMqM48JrURCdVd6DM-9RhHpcHjRoXeIVO-ZqL2G53dyzytx1d4lwk8VGdYEK3u9K0EhPm-dGai79BE7FWrQ7OCg_62HKQE3PdK41gjtYKyJl_3rUq9P3xuUSVPb5WFBH3gt f2PHUEWdxmViv6rtukCUrHK3g14OGutNCB0-YQQyWgQe6rEG8VVyXFPuB-F0ZCuFk6JOoaZ94VeaaYT_bCoTktw%3D%3D was cancelled -
Pulp workers are being terminated:
Sep 10 12:15:18 satellite pulp: django.request:WARNING: Not Found: /pulp/api/v2/task_groups/ Sep 10 12:15:18 satellite pulp: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS Sep 10 12:15:18 satellite pulp: celery.worker.control:INFO: Terminating 40dce92a-e2dc-4ea0-8f3b-61864fa1c828 (15) Sep 10 12:15:18 satellite pulp: pulp.server.async.tasks:INFO: Task canceled: 40dce92a-e2dc-4ea0-8f3b-61864fa1c828. Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Task handler raised error: Terminated(15,) Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Traceback (most recent call last): Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1725, in _set_terminated Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) raise Terminated(-(signum or 0)) Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Terminated: 15 ... ... Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 7 (SIGBUS).',) Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) Traceback (most recent call last): Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) human_status(exitcode)), Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS). ... Sep 10 14:25:30 satellite pulp: pulp.server.async.tasks:INFO: [8466a6ef] Task failed : [6ba6866a-1bc0-430f-86d5-8152acbe0e9d] : Worker terminated abnormally while processi ng task 6ba6866a-1bc0-430f-86d5-8152acbe0e9d. Check the logs for details Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) Task pulp.server.async.tasks._release_resource[8466a6ef-8be9-4c91-9ee6-aaab9dd9121e] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",) Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) Traceback (most recent call last): Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) R = retval = fun(*args, **kwargs) Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 107, in __call__ Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) return super(PulpTask, self).__call__(*args, **kwargs) Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__ Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) return self.run(*args, **kwargs) Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 296, in _release_resource Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) new_task.on_failure(exception, task_id, (), {}, MyEinfo) Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 602, in on_failure Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) if not self.request.called_directly: Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) req = self.request_stack.top Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) AttributeError: 'NoneType' object has no attribute 'top'
-
Another example of what data
/var/log/messageswill capture, when a pulp related task is getting cancelled abnormally.Oct 20 09:14:02 XXXX pulp: pulp.server.async.scheduler:ERROR: Worker 'resource_manager@XX.XX.XX' has gone missing, removing from list of workers Oct 20 09:14:02 XXXX pulp: pulp.server.async.tasks:ERROR: The worker named resource_manager@XX.XX.XX is missing. Canceling the tasks in its queue. Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker resource_manager@XX.XX.XX heartbeat time 76.243062s exceeds heartbeat interval. Consider adjusting the worker_timeout setting. Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-7@XX.XX.XX heartbeat time 27.566779s exceeds heartbeat interval. Consider adjusting the worker_timeout setting. Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-5@XX.XX.XX heartbeat time 27.630178s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.