[Satellite6] Task "waiting for Pulp to start the task" is hung forever

Solution Unverified - Updated 2 Aug 2024

Environment

Red Hat Satellite or Capsule 6.9 or older

Issue

having a foreman / dynflow task hung forever in state "waiting for Pulp to start the task"
no relevant pulp task exists
no service restart or machine reboot helps
an attempt to cancel or skip the task doesn't help either

Resolution

First, ensure the pulp task will not ever be finished or it is missing - see Diagnostic Steps.

If the pulp task exists and is waiting, cancel it first. Install pulp-admin and then cancel the task:

pulp-admin tasks cancel --task-id 1054afac-a329-4028-bc47-f0ec63bcb50c

If the foreman/dynflow task hasnt changed, let artificially create a new pulp task for it. Run in terminal (replace UUID at the end by UUID of the stuck foreman task):

cat <<EOF | foreman-rake console
@world = ForemanTasks.dynflow.world
@persistence = @world.persistence

def reset_pulp_task(foreman_uuid)
  uuid = ForemanTasks::Task.find(foreman_uuid).external_id
  execution_plan = @persistence.load_execution_plan(uuid)
  raise 'execution plan #{execution_plan} is not paused' unless execution_plan.state == :paused
  active_steps = execution_plan.steps_in_state(:running, :suspended, :error)
  active_steps.each do |step|
    action = step.action(execution_plan)
    if action.output['pulp_tasks']
      # delete a record about previous pulp tasks
      action.output.delete('pulp_tasks')
      puts "updating execution plan #{uuid} step #{step.id} action #{action.id}"
      @persistence.save_action(execution_plan.id, action)
    end
  end
  puts "resuming execution plan #{execution_plan.id}"
  @world.execute(execution_plan.id)
end

reset_pulp_task('e038e0ae-4f3e-43c6-9d3d-3e40d3df8c15')
EOF

Now the task is resumed and should continue in its execution.

For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 2.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 2.0-related Issues

Root Cause

Due to some reason, the relevant pulp task is gone or waiting to be scheduled forever, while dynflow task system in foreman is waiting on it. The resolution re-creates the pulp task and resumes the dynflow task again, optionally clears out the stuck pulp task.

One potential reason: if resource_manager queue in qpid broker is unexpectedly purged, the request to apply the pulp task is removed and the task can remain in waiting state forever.

Diagnostic Steps

Check details of some dynflow sub-task in "waiting for Pulp to start the task" state, and grab from "Output:" line like "task_id: 1054afac-a329-4028-bc47-f0ec63bcb50c". That is the pulp task UUID.
Find the pulp task status:
- grep for the UUID either in foreman-debug tarball in pulp-running_tasks file (where it might be missing)
- or find it in output of:

mongo pulp_database --eval "DBQuery.shellBatchSize = 10000000; db.task_status.find({state:{\$ne: \"finished\"}}).pretty().shellPrint()"

If there is no such task, katello/dynflow would wait forever to get a response from non-existing task.

If the task is waiting but qpid-stat-resource_manager in foreman-debug has zero queue-depth, then the task wont be executed either time.

SBR

SysMgmt

Product(s)

Red Hat Satellite

Components

foreman

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.