Capsule sync or some Satellite's repo sync gets stuck forever since upgrade to 6.16

Solution Verified - Updated

Environment

  • Red Hat Satellite 6.16 (after an upgrade to 6.16)

Issue

  • After an upgrade to Sat/Caps 6.16, some sycnhronization problem appears either on Satellite or Capsule.

  • Capsule sync gets stuck forever in Actions::Pulp3::Orchestration::Repository::RefreshRepos forever. Repeatedly killing the step, it gets stuck on a new Caps sync again on the same repository.

  • An alternative scenario: Satellite's repository sync gets stuck forever. Cancelling the task and starting a new one does not help either.

Resolution

  • Delete the blocking task(s) via pulpcore-manager:

      sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' pulpcore-manager shell << EOF
      from pulpcore.app.models import Task
      from pulpcore.tasking.tasks import cancel_task
    
      for task in Task.objects.filter(state='running',unblocked_at__isnull=True):
          cancel_task(task.pk)
    
      EOF
    
  • The underlying bug in migration is tracked in This content is not included.this JIRA.

For more KB articles/solutions related to Red Hat Satellite 6.x Capsule Sync Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Capsule Sync Issues

For more KB articles/solutions related to Red Hat Satellite 6.x Repository Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Repository Issues.

Root Cause

An upgrade to 6.16 added unblocked_at field to every pulp task. Pulp code since 6.16 assumes each and every running task had assigned a valid timestamp to unblocked_at - running tasks with empty unblocked_at field are ignored and hung forever, blocking other tasks pending on the same resource.

If the upgrade was run when a pulp task was running, there is a possibility the pulp task kept the status even during the upgrade, and turned out into the hung task.

Diagnostic Steps

On the Satellite or Capsule with stuck pulp tasks, check if there are "running" tasks like:

su - postgres -c "psql pulpcore -c \"SELECT pulp_id,pulp_created,started_at,finished_at,state,name,error,worker_id,reserved_resources_record,unblocked_at FROM core_task WHERE state='running' AND worker_id IS NULL AND unblocked_at IS NULL;\""
               pulp_id                |         pulp_created          |          started_at           | finished_at |  state  |                     name                     | error | worker_
id |                                                                                               reserved_resources_record                                                                  
                             | unblocked_at 
--------------------------------------+-------------------------------+-------------------------------+-------------+---------+----------------------------------------------+-------+--------
---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------+--------------
 019344f6-0f23-7da5-8f8e-681d2a746f0c | 2024-11-19 16:06:36.708392+01 | 2024-11-19 16:06:41.608031+01 |             | running | pulp_rpm.app.tasks.synchronizing.synchronize |       |        
   | {/pulp/api/v3/repositories/rpm/rpm/5c933f6f-4962-48ce-b475-b8e38753f644/,shared:/pulp/api/v3/remotes/rpm/rpm/3319f818-c0fc-4142-9f7a-847b13cabb90/,shared:/pulp/api/v3/domains/018f3dcf-7
4ea-735e-b0a8-7c91d8e07630/} | 
 019344f6-1a05-78cb-b61b-1aaee90c2f17 | 2024-11-19 16:06:39.49482+01  | 2024-11-19 16:07:53.867227+01 |             | running | pulp_rpm.app.tasks.synchronizing.synchronize |       |        
   | {/pulp/api/v3/repositories/rpm/rpm/5a40931e-55bd-4fe7-b260-7d30a3d3afce/,shared:/pulp/api/v3/remotes/rpm/rpm/771ac1ef-b66e-4319-9b64-f76a75a8bd3c/,shared:/pulp/api/v3/domains/018f3dcf-7
4ea-735e-b0a8-7c91d8e07630/} | 
(2 rows)

#

The tasks should:

  • have state='running'
  • have empty worker_id (no worker is assigned to the running task)
  • have empty unblocked_at timestamp
  • be started before an upgrade to 6.16 (if unsure when the upgrade was run, then su - postgres -c "psql pulpcore -c \"SELECT applied FROM django_migrations WHERE name = '0117_task_unblocked_at';\"" will show the precise timestamp; tasks must be started prior this timestamp)

If there is such a "running" task, you hit the problem - the pulp tasking system will hold locks for a Remotee / Repository, preventing you to run any synchronization of given particular repo(s).

SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.