Capsule synchronisation seems stuck - what to check to start troubleshooting?

Solution Verified - Updated

Environment

  • Red Hat Satellite 6
  • Some external Capsule

Issue

  • Capsule synchronisation takes much time and we suspect it got stuck.
  • What steps can be checked to point to further direction of possible cause?

Resolution

Check the pending steps - either in WebUI, open the task details, click to Dynflow console and expand the "(pending)" steps. Or, find the same in task export. The task details should look ike:

 7: Actions::Pulp::Consumer::SyncCapsule (waiting for Pulp to finish the task) [ 123.94s / 4.76s ] Cancel
Queue: default
Started at: 2020-07-16 11:35:27 UTC
Ended at: 2020-07-16 11:37:31 UTC
Real time: 123.94s
Execution time (excluding suspended state): 4.76s
Input:
---
capsule_id: 2
repo_pulp_id: 1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d
sync_options:
..

Output:

---
pulp_tasks:
- exception: 
  task_type: pulp.server.managers.repo.sync.sync
  _href: "/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/"
  task_id: 62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c
  tags:
  - pulp:repository:1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d
  - pulp:action:sync
  finish_time: 
  _ns: task_status
  start_time: '2020-07-16T11:35:27Z'
  traceback: 
  spawned_tasks: []
  progress_report:
..
  queue: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2
  state: running
  worker_name: reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com
  result: 
  error: 
  _id:
    "$oid": 5f103b7f402d8df4bd8eddf4
  id: 5f103b7f402d8df4bd8eddf4
poll_attempts:
  total: 5
  failed: 0

If the step status is "pending":

Then dynflow is about to prepare a request to the Capsule's pulp to sync the repo. This should last few minutes at most (depends on scaled environment).

If the status is "waiting for Pulp to start the task":

Then pulp on the Capsule got the request and created a task for it. The task is waiting for an idle worker, now. This can last arbitrary time, depending how many other tasks (raised before this one) are pending on the Capsule. The task backlog (but not the place of a task in it) can be monitored on the Capsule e.g. via:

$ qpid-stat --ssl-certificate=/etc/pki/pulp/qpid/client.crt  -b amqps://localhost:5671 -q | grep resource
  reserved_resource_worker-0@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.celery.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-0@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2            Y                      1    49     48       0   64.3k    63.3k        1     2
  reserved_resource_worker-1@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.celery.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-1@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2            Y                      1     7      6       0   8.76k    7.76k        1     2
  reserved_resource_worker-2@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.celery.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-2@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2            Y                      1    19     18       0   24.5k    23.5k        1     2
  reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.celery.pidbox       Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2            Y                      1    89     88       0    116k     115k        1     2
  resource_manager                                                                      Y                      5    85     80       0    149k     147k        1     2
  resource_manager@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.celery.pidbox                 Y                 0     0      0       0      0        0         1     2
  resource_manager@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2                      Y                      0     0      0       0      0        0         1     2
$

Where the left-most numerical column stands for the queue length, resource_manager is the backlog itself, while a reserved_resource_worker*dq2 queue has a message whenever that particular pulp worker is executing a task.

If the status is "waiting for Pulp to finish the task":

Then pulp on the Capsule has started to execute this particular task and it should be a matter of few (tens of) minutes to complete it - depending mainly how big the underlying repo is.

In both "waiting for Pulp to .." cases, one can check the particular task status by using proper task_id (or _href`), by running on the Capsule:

$ pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
$ curl -u admin:$pulpAdminPassword https://$(hostname)/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/ | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2524  100  2524    0     0   9347      0 --:--:-- --:--:-- --:--:--  9348
{
    "_href": "/pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/",
    "_id": {
        "$oid": "5f103b7f402d8df4bd8eddf4"
    },
    "_ns": "task_status",
    "error": null,
    "exception": null,
    "finish_time": "2020-07-16T11:35:34Z",
    "id": "5f103b7f402d8df4bd8eddf4",
    "progress_report": {
..
    },
    "queue": "reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com.dq2",
    "result": {
        "_ns": "repo_sync_results",
        "added_count": 0,
        "completed": "2020-07-16T11:35:34Z",
        "details": {
..
        },
        "error_message": null,
        "exception": null,
        "id": "5f103b865fda751e4f53d724",
        "importer_id": "yum_importer",
        "importer_type_id": "yum_importer",
        "removed_count": 0,
        "repo_id": "1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d",
        "result": "success",
        "started": "2020-07-16T11:35:27Z",
        "summary": {
..
        },
        "traceback": null,
        "updated_count": 0
    },
    "spawned_tasks": [
        {
            "_href": "/pulp/api/v2/tasks/0f2da86f-0182-4ea9-b780-4ba05e0a09f8/",
            "task_id": "0f2da86f-0182-4ea9-b780-4ba05e0a09f8"
        }
    ],
    "start_time": "2020-07-16T11:35:27Z",
    "state": "finished",
    "tags": [
        "pulp:repository:1-cv_sos-Library-4bb9464c-25bf-4cae-9c72-6c5faa61f85d",
        "pulp:action:sync"
    ],
    "task_id": "62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c",
    "task_type": "pulp.server.managers.repo.sync.sync",
    "traceback": null,
    "worker_name": "reserved_resource_worker-3@pmoravec-caps67-rhev.gsslab.brq2.redhat.com"
}
$

What is more important: dynflowd should poll for that status once per 16 seconds (by default, after some time). So the tail of the dynflow step:

..
poll_attempts:
  total: 5
  failed: 0

should roughly correspond to the duration of the step execution. The polling queries are visible in /var/log/httpd/pulp-https_access_ssl.log* on the Capsule:

1.2.3.4 - admin [16/Jul/2020:13:38:17 +0200] "GET /pulp/api/v2/tasks/62bd7d6e-3f2f-4cfa-a6fe-ba97c8ee947c/ HTTP/1.1" 200 754 "-" "rest-client/2.0.2 (linux-gnu x86_64) ruby/2.5.5p157"

In case one sees e.g. 10 poll attempts that end up after few minutes, no access logs for the same, but the dynflow step is pending for a hour, then dynflowd mistakenly stopped polling it. This can be a scalability concern and this solution should be followed.

For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues

SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.