Capsule sync stuck close to end forever
Environment
- Red Hat Satellite 6
- External Capsule
Issue
- Capsule synchronization seems stuck at 94% or 97% or 99% for very long time
- Checking dynflow steps of the task, some or very few repository sync/publish tasks are stuck "running"
- Capsule seems idle, however
Resolution
For a final resolution, upgrade to latest 6.7.z or 6.9.0 where This content is not included.the root cause bugzilla and/or This content is not included.the consequence/symptoms bugzilla is fixed.
As a workaround, cancel the hung pulp task and re-synchronize the failed content again.
Cancel hung pulp task
- First, restart pulp workers on the Capsule as some one supposedly died:
for i in pulp_celerybeat pulp_resource_manager pulp_workers; do service $i restart; done
- On the Capsule, run below commands to cancel the hung task. Use proper
task_idof the hung pulp task (see Diagnostic Steps to find it in dynflow console on Sat or/var/log/messageson Caps, )
task_id=82d5ad8e-0bbc-403d-879c-5a268c65a38f # customize this accordingly
pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
curl -ks -u admin:$pulpAdminPassword -X DELETE https://$(hostname -f)/pulp/api/v2/tasks/${task_id}/
- Wait a minute and check in WebUI that the hung task will be completed with an error. Optionally, cancel the dynflow step manually by clicking to Cancel link
- Reload the dynflow console and if the hung step will have "Skip" link, click on it and then on top of the page (close to "Status"), click to Resume
- Reload the page again, the Capsule Sync task should be completed now
Invoke new Capsule sync
As the hung task didnt succeeded, some content is not properly synchronized to (or published on) the Capsule. Invoke a new Capsule sync by either way:
- Regular Capsule sync (Optimised should be sufficient)
- Check from the hung step the
repo_pulp_idof the repository that was not synced properly, and invoke syncing just this repository to the Capsule. Update the command by providing proper Capsule ID (example uses:id => 2) andpulp_id:
foreman-rake console # wait a minute until the console loads
ForemanTasks.async_task(::Actions::Pulp::Consumer::SyncCapsule, OpenStruct.new(:pulp_id => "1-Devel-CVTEST-be4f2f16-dc8f-4518-b7d2-94d6f3b0024c"), OpenStruct.new(:id => 2), { :remove_missing => true})
- In regular foreman tasks, monitor the task status. Once it completes, the content should be properly synced and published on the Capsule
For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues
For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 2.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 2.0-related Issues
Diagnostic Steps
-
A dynflow step of the hung Capsule sync is stuck in "pending" or rather "waiting for Pulp to finish the task" state
-
Expanding the dynflow step, there is a "running" pulp task (in the example below with task id
82d5ad8e-0bbc-403d-879c-5a268c65a38f) that is not progressing for a long time -
Capsule's
/var/log/messagescontains an exception like below, for that task:
Nov 9 17:14:29 capsule007 pulp: pulp.server.async.tasks:INFO: [5673cb1f] Task failed : [82d5ad8e-0bbc-403d-879c-5a268c65a38f] : Worker terminated abnormally while processing task 82d5ad8e-0bbc-403d-879c-5a268c65a38f. Check the logs for details
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) Task pulp.server.async.tasks._release_resource[5673cb1f-735a-41af-a862-dc39f25615f4] raised unexpected: AttributeError("'NoneType' object has no attribute 'top'",)
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) Traceback (most recent call last):
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) R = retval = fun(*args, **kwargs)
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 108, in __call__
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) return super(PulpTask, self).__call__(*args, **kwargs)
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) return self.run(*args, **kwargs)
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 297, in _release_resource
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) new_task.on_failure(exception, task_id, (), {}, MyEinfo)
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 612, in on_failure
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) if not self.request.called_directly:
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) File "/usr/lib/python2.7/site-packages/celery/app/task.py", line 978, in _get_request
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) req = self.request_stack.top
Nov 9 17:14:29 capsule007 pulp: celery.app.trace:ERROR: [5673cb1f] (29742-53408) AttributeError: 'NoneType' object has no attribute 'top'
Nov 9 17:14:30 capsule007 pulp: celery.app.trace:INFO: [8a22a560] Task pulp.server.controllers.repository.download_deferred[8a22a560-2f93-422c-8e5d-3ffb91f9d80f] succeeded in 0.254975541029s: None
Nov 9 17:14:38 capsule007 pulp: celery.worker.request:ERROR: (2501-53408) Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Nov 9 17:14:38 capsule007 pulp: celery.worker.request:ERROR: (2501-53408) Traceback (most recent call last):
Nov 9 17:14:38 capsule007 pulp: celery.worker.request:ERROR: (2501-53408) File "/usr/lib64/python2.7/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
Nov 9 17:14:38 capsule007 pulp: celery.worker.request:ERROR: (2501-53408) human_status(exitcode)),
Nov 9 17:14:38 capsule007 pulp: celery.worker.request:ERROR: (2501-53408) WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.