Capsule sync task stuck despite the Capsule has completed all sync tasks already

Solution Verified - Updated

Environment

Red Hat Satellite 6.7 or newer

Issue

  • Capsule sync task is stuck for hours in Satellite WebUI / in hammer output
  • Checking status of the Capsule, it is idle and all its pulp tasks has completed a time ago already
  • More detailed check reveals, that also the "pending" dynflow steps wait for a pulp task on the Capsule, that is already completed

Is the Capsule synchronized, then? How to unblock the Satellite's task?

Resolution

Perform steps (1) through (3) in the Diagnostic Steps section of this solution article, and if the issue still persists, proceed with steps (4) and (5) in the Diagnostic Steps section before applying the resolution described here.

The below procedure will help only if the pulp task(s) referred in the pending step(s) is really completed but Satellite hasnt detected it for a while - see Diagnostic Steps for details how to check this.

  • Ensure you are on Satellite 6.8+ or have tfm-rubygem-foreman-tasks of 0.17.5.2-5 version or higher. There is a This content is not included.hotfix for Sat 6.7 when needed.

  • In WebUI, open Administer -> Settings -> ForemanTasks -> set "Polling intervals multiplier" to some higher value, e.g. 8. Note that default polling interval is 16s, and the value of this parameter multiplies that default.

  • Immediately since that time, any further polling probes should use the new interval values.

  • To unblock the stuck dynflow / Satellite task, the only remedy step is to restart dynflowd service on the Satellite server (ideally when no other task is really pending). dynflowd should recognize the real task status (completed) during its startup:

    # systemctl restart dynflow-sidekiq@*.service
    

If there is at least 1GB spare memory, it is worth considering to add sidekiq worker(s) to make the polling more concurrent.

For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues

Diagnostic Steps

  1. Check if there are any earlier Actions::Katello::CapsuleContent::Sync (i.e. capsule sync) tasks that are in stopped:pending state by examining the output from the following command from the Satellite server:
# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped' NOOP=true

If there is any, delete them by running the following command on the Satellite server:

# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped'
  1. Confirm that the Actions::Katello::CapsuleContent::Sync (i.e. capsule sync) tasks deleted in step (1) above no longer exist, by running the following command on the Satellite server once more and examining its output:
# foreman-rake foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::CapsuleContent::Sync result = pending' STATES='stopped' NOOP=true
  1. If a new capsule sync task did not kick in, manually launch one and wait for its completion.

  2. If above steps did not help, find the highest number of polling requests per second (and ideally combine this with Capsule's /var/log/httpd/pulp-https_access_ssl.log that contain same requests from Capsule sync tasks).

On a Satellite 6.9 or below server, use the following command to check:

# grep "GET /pulp/api/v2/tasks/" /var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail

whereas on a Satellite 6.10+ server, use the following command:

# grep "GET /pulp/api/v3/tasks/" /var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail

The below example is from a Satellite 6.9 or below server:

# grep "GET /pulp/api/v2/tasks/" var/log/httpd/foreman-ssl_access_ssl.log | awk '{ print $4 }' | sort | uniq -c | sort -n | tail
   67 [02/Nov/2021:05:54:34
   67 [02/Nov/2021:07:27:12
   68 [02/Nov/2021:04:53:28
   69 [02/Nov/2021:04:42:39
   69 [02/Nov/2021:05:21:27
   69 [02/Nov/2021:05:39:52
   70 [02/Nov/2021:05:36:22
   70 [02/Nov/2021:06:47:55
   71 [02/Nov/2021:04:51:04
   71 [02/Nov/2021:05:44:52

The above means there were 71 polling requests to a pulp task on 2nd Nov at 05:44:52. Any value above (rule-of-thumb) 10 for default tuning (5 workers in one sidekiq service) is concerning.

  1. To detect the problem in WebUI task details, a dynflow step is pending in state like:
  7: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to finish the task) [ 12345.94s / 23.76s ] Cancel
  ..
  Output:

  pulp_tasks:
   - exception: 
    task_type: pulp.server.managers.repo.sync.sync
    _href: "/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/"
    task_id: 62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c
    tags:
    - pulp:repository:1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d
    - pulp:action:sync
    finish_time: 
    _ns: task_status
    start_time: '2020-07-16T11:35:27Z'
    traceback: 
    spawned_tasks: []
    progress_report:
  ..
    queue: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2
    state: running
    worker_name: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com
    result: 
    error: 
    _id:
      "$oid": 5f103b7f402d8df4bd8eddf4
    id: 5f103b7f402d8df4bd8eddf4
  poll_attempts:
    total: 15
    failed: 0

for a long time. poll_attempts is not increasing over time and a check on the Capsule for the task itself shows the task is complete:

$ pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
$ curl -u admin:$pulpAdminPassword https://$(hostname)/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/ | 
python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2524  100  2524    0     0   9347      0 --:--:-- --:--:-- --:--:--  9348
{
..
    "error": null,
    "exception": null,
    "finish_time": "2020-07-16T11:35:34Z",
..
    "start_time": "2020-07-16T11:35:27Z",
    "state": "finished",
..
    "task_id": "62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c",
    "task_type": "pulp.server.managers.repo.sync.sync",
    "traceback": null,
    "worker_name": "reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com"
}
SBR
Product(s)
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.