[Satellite6] Some pulp workers are missing after Satellite reboot

Solution Verified - Updated

Environment

Red Hat Satellite 6.9 and below

Issue

  • had to reboot Satellite machine due to some reason
  • after the reboot, some pulp workers seem to be missing, e.g. per ps output

Resolution

For a final resolution, wait until This content is not included.underlying bugzilla is fixed.

For a workaround, do not reboot Satellite when it has a pending pulp task. In general, a pulp task is running when either (foreman) task is running:

  • repo sync
  • capsule sync
  • Content View promote/publish
  • registering a client or updating its profile

To check there is no pending pulp task:

pulpAdminPassword=$(grep ^default_password /etc/pulp/server.conf | cut -d' ' -f2)
pulp-admin -u admin -p $pulpAdminPassword tasks list --state=running

The 2nd command shall find no running task.

  • or check no message is waiting in qpid broker for pulp tasking system:
qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b "amqps://localhost:5671" -q | grep resource

this command shall print something like:

  reserved_resource_worker-0@satellite.example.com.celery.pidbox       Y                 0     1      1       0    449      449         1     2
  reserved_resource_worker-0@satellite.example.com.dq             Y    Y                 0     0      0       0      0        0         1     2
  reserved_resource_worker-1@satellite.example.com.celery.pidbox       Y                 0     1      1       0    449      449         1     2
  reserved_resource_worker-1@satellite.example.com.dq             Y    Y                 0     0      0       0      0        0         1     2
  resource_manager                                                                     Y                      0     0      0       0      0        0         1     2
  resource_manager@satellite.example.com.celery.pidbox                 Y                 0     1      1       0    449      449         1     2
  resource_manager@satellite.example.com.dq                       Y    Y                 0     0      0       0      0        0         1     2

First numerical column (be aware, it can be 3rd or 4th, due to some "Y"s) must contain only zeros. Otherwise, some task is still pending.

If either check passes, it is safe to reboot the system wrt. the bug/issue you hit.

For more KB articles/solutions related to Red Hat Satellite 6.x Pulp 2.0 Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Pulp 2.0-related Issues

Diagnostic Steps

  • ps aux | grep pulp | grep celery shows less than expected numbers of pulp workers. Usually, there shall be 1 1celery beat1 process, two 1resource manager1 processes, and for each worker, pair of worker processes.

  • /var/log/messages having backtrace (just after the reboot, even after AMQP broker start-up):

eb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) Unrecoverable error: ConnectError('[Errno 111] Connection refused',)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) Traceback (most recent call last):
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/worker/__init__.py", line 206, in start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.blueprint.start(self)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.on_start()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/apps/worker.py", line 157, in on_start
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     sender=self.hostname, instance=self, conf=self.app.conf,
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/utils/dispatch/signal.py", line 166, in send
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     response = receiver(signal=self, sender=sender, **named)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/app.py", line 52, in initialize_worker
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     tasks._delete_worker(sender, normal_shutdown=True)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 258, in _delete_worker
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     cancel(task_status['task_id'])
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 591, in cancel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     controller.revoke(task_id, terminate=True)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 171, in revoke
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     'signal': signal}, **kwargs)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/celery/app/control.py", line 306, in broadcast
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     limit, callback, channel=channel,
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/pidbox.py", line 283, in _broadcast
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     chan = channel or self.connection.default_channel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self._connection = self._establish_connection()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn = self.transport.establish_connection()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1600, in establish_connection
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn = self.Connection(**opts)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 1261, in __init__
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self._qpid_conn = establish(**self.connection_options)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 112, in establish
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     conn.open(timeout=timeout)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "<string>", line 6, in open
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 323, in open
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.attach(timeout=timeout)
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "<string>", line 6, in attach
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 341, in attach
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     if not self._ewait(lambda: self._transport_connected and not self._unlinked(), timeout=timeout):
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 274, in _ewait
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     self.check_error()
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 267, in check_error
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232)     raise e
Feb 05 00:02:12 satellite pulp[1337]: celery.worker:ERROR: (1337-35232) ConnectError: [Errno 111] Connection refused
SBR
Product(s)
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.