Repository synchronization is stuck and sync tasks are being cancelled in Red Hat Satellite 6

Solution Verified - Updated

Environment

  • Red Hat Satellite or Capsule 6.5 - 6.9

Issue

  • Pulp is removing pulp workers from Satellite and canceling their tasks:

    AttributeError("'NoneType' object has no attribute 'top'")
    WorkerLostError('Worker exited prematurely: signal 7 (SIGBUS).')
    

Resolution

  • If Content View Publish\Promote , Repository Sync etc tasks are affected and same issue is being observed on Satellite server, increase the worker_timeout value on the Satellite server.

    # grep "worker_timeout" /etc/pulp/server.conf
    # satellite-installer -S satellite --foreman-proxy-content-pulp-worker-timeout 3600 --katello-pulp-worker-timeout 3600
    # grep "worker_timeout" /etc/pulp/server.conf
    
  • If Capsule Sync tasks are affected by this problem, then please increase the worker_timeout value on the Capsule server.

    # grep "worker_timeout" /etc/pulp/server.conf
    # satellite-installer -S capsule --foreman-proxy-content-pulp-worker-timeout 3600
    # grep "worker_timeout" /etc/pulp/server.conf
    
  • Restart pulp services on the affected satellite\capsule server(s).

       # satellite-maintain service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat
    
       OR
    
       # foreman-maintain service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat
    
       OR
    
       # katello-service restart --only pulp_resource_manager,pulp_workers,pulp_celerybeat
    

Be aware, the above solution works for pulp-2 only but not for pulp-3 that is available since Satellite/Capsule 6.10.

For more KB articles/solutions related to Red Hat Satellite 6.x Repository Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Repository Issues.

Root Cause

Diagnostic Steps

  • Tasks are being cancelled:

       Sep 10 12:15:17 satellite pulp: nectar.downloaders.threaded:INFO: Download of http://satellite.example.com/streamer/var/lib/pulp/content/rpm/gnome-settings-daemon-devel/3.26.2/9.el7/i686/e2735c40e2ffde4242d24e3bb2774373261f52e6/gnome-settings-daemon-devel-3.26.2-9.el7.i686.rpm?policy=eyJleHRlbnNpb25zIjoge30sICJyZXNvdXJjZSI6ICIvc3RyZWFtZXI
    vdmFyL2xpYi9wdWxwL2NvbnRlbnQvcnBtL2dub21lLXNldHRpbmdzLWRhZW1vbi1kZXZlbC8zLjI2LjIvOS5lbDcvaTY4Ni9lMjczNWM0MGUyZmZkZTQyNDJkMjRlM2JiMjc3NDM3MzI2MWY1MmU2L2dub21lLXNldHRpbmdzLWRh
    ZW1vbi1kZXZlbC0zLjI2LjItOS5lbDcuaTY4Ni5ycG0iLCAiZXhwaXJhdGlvbiI6IDE2MzEyNjM3NDN9;signature=LrjUfoPe1lhwBtiYONwGTtLarRmQwyVrT6-x3NfHWsXMkNh-vaYMwvrSfyevK_4Me61keWNdmNiQVcgeCG
    5uEY5ZDAqCgKK--I6qXh_PxeDg-7-i-0SbtVu4yRHafMqM48JrURCdVd6DM-9RhHpcHjRoXeIVO-ZqL2G53dyzytx1d4lwk8VGdYEK3u9K0EhPm-dGai79BE7FWrQ7OCg_62HKQE3PdK41gjtYKyJl_3rUq9P3xuUSVPb5WFBH3gt
    f2PHUEWdxmViv6rtukCUrHK3g14OGutNCB0-YQQyWgQe6rEG8VVyXFPuB-F0ZCuFk6JOoaZ94VeaaYT_bCoTktw%3D%3D was cancelled
    
  • Pulp workers are being terminated:

       Sep 10 12:15:18 satellite pulp: django.request:WARNING: Not Found: /pulp/api/v2/task_groups/
       Sep 10 12:15:18 satellite pulp: kombu.transport.qpid:INFO: Connected to qpid with SASL mechanism ANONYMOUS
       Sep 10 12:15:18 satellite pulp: celery.worker.control:INFO: Terminating 40dce92a-e2dc-4ea0-8f3b-61864fa1c828 (15)
       Sep 10 12:15:18 satellite pulp: pulp.server.async.tasks:INFO: Task canceled: 40dce92a-e2dc-4ea0-8f3b-61864fa1c828.
       Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Task handler raised error: Terminated(15,)
       Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Traceback (most recent call last):
       Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1725, in _set_terminated
       Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464)     raise Terminated(-(signum or 0))
       Sep 10 12:15:18 satellite pulp: celery.worker.request:ERROR: (1253-90464) Terminated: 15
       ...
       ...
       Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 7 (SIGBUS).',)
       Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) Traceback (most recent call last):
       Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000)   File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
       Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000)     human_status(exitcode)),
       Sep 10 14:25:29 satellite pulp: celery.worker.request:ERROR: (1249-16000) WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS).
       ...
       Sep 10 14:25:30 satellite pulp: pulp.server.async.tasks:INFO: [8466a6ef] Task failed : [6ba6866a-1bc0-430f-86d5-8152acbe0e9d] : Worker terminated abnormally while processi
    ng task 6ba6866a-1bc0-430f-86d5-8152acbe0e9d.  Check the logs for details
    Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) Task pulp.server.async.tasks._release_resource[8466a6ef-8be9-4c91-9ee6-aaab9dd9121e] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) Traceback (most recent call last):
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     R = retval = fun(*args, **kwargs)
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 107, in __call__
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     return super(PulpTask, self).__call__(*args, **kwargs)
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     return self.run(*args, **kwargs)
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 296, in _release_resource
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     new_task.on_failure(exception, task_id, (), {}, MyEinfo)
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 602, in on_failure
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     if not self.request.called_directly:
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)   File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000)     req = self.request_stack.top
       Sep 10 14:25:30 satellite pulp: celery.app.trace:ERROR: [8466a6ef] (24207-16000) AttributeError: 'NoneType' object has no attribute 'top'
    
  • Another example of what data /var/log/messages will capture, when a pulp related task is getting cancelled abnormally.

    Oct 20 09:14:02 XXXX pulp: pulp.server.async.scheduler:ERROR: Worker 'resource_manager@XX.XX.XX' has gone missing, removing from list of workers
    Oct 20 09:14:02 XXXX pulp: pulp.server.async.tasks:ERROR: The worker named resource_manager@XX.XX.XX is missing. Canceling the tasks in its queue.
    Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker resource_manager@XX.XX.XX heartbeat time 76.243062s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
    Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-7@XX.XX.XX heartbeat time 27.566779s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
    Oct 20 09:14:35 XXXX pulp: pulp.server.async.worker_watcher:WARNING: Worker reserved_resource_worker-5@XX.XX.XX heartbeat time 27.630178s exceeds heartbeat interval. Consider adjusting the worker_timeout setting.
    
SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.