Capsule sync task stuck despite the Capsule has completed all sync tasks already
Environment
Red Hat Satellite 6.7 or newer
Issue
- Capsule sync task is stuck for hours in Satellite WebUI / in
hammeroutput - Checking status of the Capsule, it is idle and all its
pulptasks has completed a time ago already - More detailed check reveals, that also the "pending" dynflow steps wait for a
pulptask on the Capsule, that is already completed
Is the Capsule synchronized, then? How to unblock the Satellite's task?
Resolution
Perform steps (1) through (3) in the Diagnostic Steps section of this solution article, and if the issue still persists, proceed with steps (4) and (5) in the Diagnostic Steps section before applying the resolution described here.
The below procedure will help only if the pulp task(s) referred in the pending step(s) is really completed but Satellite hasnt detected it for a while - see Diagnostic Steps for details how to check this.
-
Ensure you are on Satellite 6.8+ or have
tfm-rubygem-foreman-tasksof0.17.5.2-5version or higher. There is a This content is not included.hotfix for Sat 6.7 when needed. -
In WebUI, open Administer -> Settings -> ForemanTasks -> set "Polling intervals multiplier" to some higher value, e.g. 8. Note that default polling interval is 16s, and the value of this parameter multiplies that default.
-
Immediately since that time, any further polling probes should use the new interval values.
-
To unblock the stuck
dynflow/ Satellite task, the only remedy step is to restartdynflowdservice on the Satellite server (ideally when no other task is really pending).dynflowdshould recognize the real task status (completed) during its startup:# systemctl restart dynflow-sidekiq@*.service
If there is at least 1GB spare memory, it is worth considering to add sidekiq worker(s) to make the polling more concurrent.
For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues
Diagnostic Steps
- Check if there are any earlier
Actions::Katello::CapsuleContent::Sync(i.e. capsule sync) tasks that are instopped:pendingstate by examining the output from the following command from the Satellite server:
# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped' NOOP=true
If there is any, delete them by running the following command on the Satellite server:
# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped'
- Confirm that the
Actions::Katello::CapsuleContent::Sync(i.e. capsule sync) tasks deleted in step (1) above no longer exist, by running the following command on the Satellite server once more and examining its output:
# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped' NOOP=true
-
If a new capsule sync task did not kick in, manually launch one and wait for its completion.
-
If above steps did not help, find the highest number of polling requests per second (and ideally combine this with Capsule's
/var/log/httpd/pulp-https_access_ssl.logthat contain same requests from Capsule sync tasks).
On a Satellite 6.9 or below server, use the following command to check:
# grep "GET /pulp/api/v2/tasks/" /var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail
whereas on a Satellite 6.10+ server, use the following command:
# grep "GET /pulp/api/v3/tasks/" /var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail
The below example is from a Satellite 6.9 or below server:
# grep "GET /pulp/api/v2/tasks/" var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail
67 [02/Nov/2021:05:54:34
67 [02/Nov/2021:07:27:12
68 [02/Nov/2021:04:53:28
69 [02/Nov/2021:04:42:39
69 [02/Nov/2021:05:21:27
69 [02/Nov/2021:05:39:52
70 [02/Nov/2021:05:36:22
70 [02/Nov/2021:06:47:55
71 [02/Nov/2021:04:51:04
71 [02/Nov/2021:05:44:52
The above means there were 71 polling requests to a pulp task on 2nd Nov at 05:44:52. Any value above (rule-of-thumb) 10 for default tuning (5 workers in one sidekiq service) is concerning.
- To detect the problem in WebUI task details, a dynflow step is pending in state like:
7: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to finish the task) [ 12345.94s / 23.76s ] Cancel
..
Output:
pulp_tasks:
- exception:
task_type: pulp.server.managers.repo.sync.sync
_href: "/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/"
task_id: 62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c
tags:
- pulp:repository:1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d
- pulp:action:sync
finish_time:
_ns: task_status
start_time: '2020-07-16T11:35:27Z'
traceback:
spawned_tasks: []
progress_report:
..
queue: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2
state: running
worker_name: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com
result:
error:
_id:
"$oid": 5f103b7f402d8df4bd8eddf4
id: 5f103b7f402d8df4bd8eddf4
poll_attempts:
total: 15
failed: 0
for a long time. poll_attempts is not increasing over time and a check on the Capsule for the task itself shows the task is complete:
$ pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
$ curl -u admin:$pulpAdminPassword https://$(hostname)/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/ |
python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2524 100 2524 0 0 9347 0 --:--:-- --:--:-- --:--:-- 9348
{
..
"error": null,
"exception": null,
"finish_time": "2020-07-16T11:35:34Z",
..
"start_time": "2020-07-16T11:35:27Z",
"state": "finished",
..
"task_id": "62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c",
"task_type": "pulp.server.managers.repo.sync.sync",
"traceback": null,
"worker_name": "reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com"
}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.