[Satellite6] Task "waiting for Pulp to start the task" is hung forever
Environment
Red Hat Satellite or Capsule 6.9 or older
Issue
- having a foreman / dynflow task hung forever in state "waiting for Pulp to start the task"
- no relevant pulp task exists
- no service restart or machine reboot helps
- an attempt to cancel or skip the task doesn't help either
Resolution
First, ensure the pulp task will not ever be finished or it is missing - see Diagnostic Steps.
If the pulp task exists and is waiting, cancel it first. Install pulp-admin and then cancel the task:
pulp-admin tasks cancel --task-id 1054afac-a329-4028-bc47-f0ec63bcb50c
If the foreman/dynflow task hasnt changed, let artificially create a new pulp task for it. Run in terminal (replace UUID at the end by UUID of the stuck foreman task):
cat <<EOF | foreman-rake console
@world = ForemanTasks.dynflow.world
@persistence = @world.persistence
def reset_pulp_task(foreman_uuid)
uuid = ForemanTasks::Task.find(foreman_uuid).external_id
execution_plan = @persistence.load_execution_plan(uuid)
raise 'execution plan #{execution_plan} is not paused' unless execution_plan.state == :paused
active_steps = execution_plan.steps_in_state(:running, :suspended, :error)
active_steps.each do |step|
action = step.action(execution_plan)
if action.output['pulp_tasks']
# delete a record about previous pulp tasks
action.output.delete('pulp_tasks')
puts "updating execution plan #{uuid} step #{step.id} action #{action.id}"
@persistence.save_action(execution_plan.id, action)
end
end
puts "resuming execution plan #{execution_plan.id}"
@world.execute(execution_plan.id)
end
reset_pulp_task('e038e0ae-4f3e-43c6-9d3d-3e40d3df8c15')
EOF
Now the task is resumed and should continue in its execution.
For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 2.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 2.0-related Issues
Root Cause
Due to some reason, the relevant pulp task is gone or waiting to be scheduled forever, while dynflow task system in foreman is waiting on it. The resolution re-creates the pulp task and resumes the dynflow task again, optionally clears out the stuck pulp task.
One potential reason: if resource_manager queue in qpid broker is unexpectedly purged, the request to apply the pulp task is removed and the task can remain in waiting state forever.
Diagnostic Steps
-
Check details of some dynflow sub-task in "waiting for Pulp to start the task" state, and grab from "Output:" line like "
task_id: 1054afac-a329-4028-bc47-f0ec63bcb50c". That is thepulptask UUID. -
Find the pulp task status:
- grep for the UUID either in
foreman-debugtarball inpulp-running_tasksfile (where it might be missing) - or find it in output of:
- grep for the UUID either in
mongo pulp_database --eval "DBQuery.shellBatchSize = 10000000; db.task_status.find({state:{\$ne: \"finished\"}}).pretty().shellPrint()"
If there is no such task, katello/dynflow would wait forever to get a response from non-existing task.
If the task is waiting but qpid-stat-resource_manager in foreman-debug has zero queue-depth, then the task wont be executed either time.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.