Capsule sync or some Satellite's repo sync gets stuck forever since upgrade to 6.16
Environment
- Red Hat Satellite 6.16 (after an upgrade to 6.16)
Issue
-
After an upgrade to Sat/Caps 6.16, some sycnhronization problem appears either on Satellite or Capsule.
-
Capsule sync gets stuck forever in
Actions::Pulp3::Orchestration::Repository::RefreshReposforever. Repeatedly killing the step, it gets stuck on a new Caps sync again on the same repository. -
An alternative scenario: Satellite's repository sync gets stuck forever. Cancelling the task and starting a new one does not help either.
Resolution
-
Delete the blocking task(s) via
pulpcore-manager:sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager shell << EOF from pulpcore.app.models import Task from pulpcore.tasking.tasks import cancel_task for task in Task.objects.filter(state='running',unblocked_at__isnull=True): cancel_task(task.pk) EOF -
The underlying bug in migration is tracked in This content is not included.this JIRA.
For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues
For more KB articles/solutions related to Red Hat Satellite 6.x Repository Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Repository Issues.
Root Cause
An upgrade to 6.16 added unblocked_at field to every pulp task. Pulp code since 6.16 assumes each and every running task had assigned a valid timestamp to unblocked_at - running tasks with empty unblocked_at field are ignored and hung forever, blocking other tasks pending on the same resource.
If the upgrade was run when a pulp task was running, there is a possibility the pulp task kept the status even during the upgrade, and turned out into the hung task.
Diagnostic Steps
On the Satellite or Capsule with stuck pulp tasks, check if there are "running" tasks like:
su - postgres -c "psql pulpcore -c \"SELECT pulp_id,pulp_created,started_at,finished_at,state,name,error,worker_id,reserved_resources_record,unblocked_at FROM core_task WHERE state='running' AND worker_id IS NULL AND unblocked_at IS NULL;\""
pulp_id | pulp_created | started_at | finished_at | state | name | error | worker_
id | reserved_resources_record
| unblocked_at
--------------------------------------+-------------------------------+-------------------------------+-------------+---------+----------------------------------------------+-------+--------
---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------+--------------
019344f6-0f23-7da5-8f8e-681d2a746f0c | 2024-11-19 16:06:36.708392+01 | 2024-11-19 16:06:41.608031+01 | | running | pulp_rpm.app.tasks.synchronizing.synchronize | |
| {/pulp/api/v3/repositories/rpm/rpm/5c933f6f-4962-48ce-b475-b8e38753f644/,shared:/pulp/api/v3/remotes/rpm/rpm/3319f818-c0fc-4142-9f7a-847b13cabb90/,shared:/pulp/api/v3/domains/018f3dcf-7
4ea-735e-b0a8-7c91d8e07630/} |
019344f6-1a05-78cb-b61b-1aaee90c2f17 | 2024-11-19 16:06:39.49482+01 | 2024-11-19 16:07:53.867227+01 | | running | pulp_rpm.app.tasks.synchronizing.synchronize | |
| {/pulp/api/v3/repositories/rpm/rpm/5a40931e-55bd-4fe7-b260-7d30a3d3afce/,shared:/pulp/api/v3/remotes/rpm/rpm/771ac1ef-b66e-4319-9b64-f76a75a8bd3c/,shared:/pulp/api/v3/domains/018f3dcf-7
4ea-735e-b0a8-7c91d8e07630/} |
(2 rows)
#
The tasks should:
- have
state='running' - have empty worker_id (no worker is assigned to the running task)
- have empty
unblocked_attimestamp - be started before an upgrade to 6.16 (if unsure when the upgrade was run, then
su - postgres -c "psql pulpcore -c \"SELECT applied FROM django_migrations WHERE name = '0117_task_unblocked_at';\""will show the precise timestamp; tasks must be started prior this timestamp)
If there is such a "running" task, you hit the problem - the pulp tasking system will hold locks for a Remotee / Repository, preventing you to run any synchronization of given particular repo(s).
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.