[Satellite6] pulp remove orphans task runs many hours

Solution Verified - Updated

Environment

Red Hat Satellite 6

Issue

  • we were asked to remove pulp orphans, e.g. via /etc/cron.weekly/katello-remove-orphans or foreman-rake katello:delete_orphaned_content or directly via pulp-admin orphan remove --all
  • that task is running for hours

How long the task can take?

Resolution

NOTE: When running foreman-rake katello:delete_orphaned_content from the Satellite CLI, tasked named Remove orphans will launched on Satellite webui Monitor --> Tasks

Notice that if remove orphan task had not been successfully running for a longer time, it can take many hours to complete one. To identify past tasks that finished successfully:

mongo pulp_database --eval "DBQuery.shellBatchSize = 100000000; db.task_status.find({ \$and: [ {'task_type': 'pulp.server.managers.content.orphan.delete_all_orphans' }, {'state': 'finished'}, {'error': null} ] }).sort({'finish_time': 1}).shellPrint()" | tail
  • Ensure there is just one such task running, since multiple instances can affect each other and cause the task is running arbitrary long. Below query should return just one task:
mongo pulp_database --eval "DBQuery.shellBatchSize = 100000000; db.task_status.find({ \$and: [ {'task_type': /orphan/ }, {'state': 'running'} ] }).shellPrint()"
MongoDB shell version: 2.6.11
connecting to: pulp_database
{ "_id" : ObjectId("5a8575da3979a905e09eb80f"), "task_id" : "9e8af5ee-cef4-4379-a4a4-eb7be88ec9b5", "exception" : null, "task_type" : "pulp.server.managers.content.orphan.delete_all_orphans", "tags" : [ "pulp:content_unit:orphans" ], "finish_time" : null, "_ns" : "task_status", "traceback" : null, "spawned_tasks" : [ ], "progress_report" : {  }, "worker_name" : "celery", "result" : null, "error" : null, "group_id" : null, "id" : null, "state" : "running", "start_time" : "2018-02-15T11:58:18Z" }
  • If multiple tasks are running, then cancel them all:
pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
for task_id in $(mongo pulp_database --eval "DBQuery.shellBatchSize = 100000000; db.task_status.find({ \$and: [ {'task_type': /orphan/ }, {'state': 'running'} ] }).shellPrint()" | grep task_id | awk -F "task_id" '{ print $2 }' | cut -d\" -f3); do
    pulp-admin -u admin -p $pulpAdminPassword tasks cancel --task-id $task_id
done
  • Ensure e.g. by the mongo command that there is no delete orphan task running. If there is, repeat above step again.

  • Once there is no orphan remove task running, launch a new one:

pulp-admin -u admin -p $pulpAdminPassword orphan remove --all --bg

For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 2.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 2.0-related Issues

Root Cause

Two possible causes behind:

  • Remove orphan task has not been (successfully) running for a long time. Then identifying what all units need to be deleted (and deleting them) can take many hours. The time is proportional to the number of repos, packages, size of repos, and the time orphan removal hasnt been run. Notice that orphan removal is automatically invoked via weekly cronjob /etc/cron.weekly/katello-remove-orphans and having just older executions of the task means there is a problem that shall be fixed.

  • Multiple remove orphan tasks were invoked concurrently. Notice there are few types of them (remove all orphans, all of one type, or an individual orphaned unit) and any kind of such task might affect concurrently running remove orphan tasks.

SBR
Product(s)
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.