Error: Task is stuck on Actions::Pulp::Repository::DistributorPublish (suspended)
Environment
- Red Hat Satellite 6
Issue
- After trying to synchronise a channel, I'm noticing that the channel has been in pending status and is not progressing.:
It was confirmed that no data was moving by running
# watch df /var/lib/pulp
Resolution
- It was observed that qpid AMQP was not running and in addition, there was a stuck (pid) process.
# for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done
Stopping Qpid AMQP daemon: [FAILED]
Starting Qpid AMQP daemon: [ OK ]
celery init v10.0.
Using configuration: /etc/default/pulp_workers, /etc/default/pulp_celerybeat
Restarting celery periodic task scheduler
Stopping pulp_celerybeat... OK
Starting pulp_celerybeat...
celery init v10.0.
Using config script: /etc/default/pulp_resource_manager
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> resource_manager@yourHostname.Satellite.com: TERM -> 2365
> Waiting for 1 node -> 2365............
> resource_manager@yourHostname.Satellite.com: OK
> Restarting node resource_manager@yourHostname.Satellite.com: OK
celery init v10.0.
Using config script: /etc/default/pulp_workers
celery multi v3.1.11 (Cipater)
> Stopping nodes...
> reserved_resource_worker-7@yourHostname.Satellite.com: TERM -> 2584
> reserved_resource_worker-2@yourHostname.Satellite.com: TERM -> 2469
> reserved_resource_worker-1@yourHostname.Satellite.com: TERM -> 2445
> reserved_resource_worker-4@yourHostname.Satellite.com: TERM -> 2516
> reserved_resource_worker-6@yourHostname.Satellite.com: TERM -> 2566
> reserved_resource_worker-0@yourHostname.Satellite.com: TERM -> 2424
> reserved_resource_worker-3@yourHostname.Satellite.com: TERM -> 2493
> reserved_resource_worker-5@yourHostname.Satellite.com: TERM -> 2538
> Waiting for 8 nodes -> 2584, 2469, 2445, 2516, 2566, 2424, 2493, 2538..........................................
> reserved_resource_worker-3@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-3@yourHostname.Satellite.com: OK
> Waiting for 7 nodes -> 2584, 2469, 2445, 2516, 2566, 2424, 2538....
> reserved_resource_worker-7@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-7@yourHostname.Satellite.com: OK
> Waiting for 6 nodes -> 2469, 2445, 2516, 2566, 2424, 2538....
> reserved_resource_worker-2@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-2@yourHostname.Satellite.com: OK
> Waiting for 5 nodes -> 2445, 2516, 2566, 2424, 2538....
> reserved_resource_worker-1@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-1@yourHostname.Satellite.com: OK
> Waiting for 4 nodes -> 2516, 2566, 2424, 2538....
> reserved_resource_worker-4@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-4@yourHostname.Satellite.com: OK
> Waiting for 3 nodes -> 2566, 2424, 2538....
> reserved_resource_worker-6@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-6@yourHostname.Satellite.com: OK
> Waiting for 2 nodes -> 2424, 2538.....
> reserved_resource_worker-5@yourHostname.Satellite.com: OK
> Restarting node reserved_resource_worker-5@yourHostname.Satellite.com: OK
> Waiting for 1 node -> 2424............................................................................... < ------------- This PID is stuck
..................................................................................................................................
- In order to correct the hung process kill the offending id by logging into another console with:
# kill -9 2424
Then restart the services again with:
# for s in {qpidd,pulp_celerybeat,pulp_resource_manager,pulp_workers,httpd}; do sudo service $s restart; done
- Once these steps are completed if the details still show as pending, then resynchronize the channel.
For more KB articles/solutions related to Red Hat Satellite 6.x Repository Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Repository Issues.
Root Cause
- Pulp couldn't resume the task because a task was hung and Qpid AMQP daemon was not running.
SBR
Product(s)
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.