[Satellite 6] Remote jobs fail with 'The task has been cancelled. Is katello-agent installed and goferd running on the Host?'

Solution Verified - Updated

Environment

  • Satellite 6
  • katello-agent/goferd

Issue

  • Execution of a remote job on a content host fails with: Host did not respond within 20 seconds. The task has been cancelled. Is katello-agent installed and goferd running on the Host?

Resolution

  • Re-register one of the content hosts.

For more KB articles/solutions related to Red Hat Satellite 6.x Remote Execution Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Remote Execution Issues

Root Cause

  • There are 2 content hosts that share the same consumer uuid.

Diagnostic Steps

  • When trying to update the errata with hammer the following error appears:
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.142017 +0200 SERVER (info) [1529]: Connection from <host_ip_address>:40450 (to :5647) failed: amqp:connection:framing-error SSL Failure: Unknown error
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.797305 +0200 SERVER (info) [2596]: Accepted connection to :5647 from <host_ip_address>:12886
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.861825 +0200 SERVER (info) [2597]: Accepted connection to :5647 from <host_ip_address>:12888
  • We can see that netstat from the affected host shows just one established connection to Satellite on port 5647:
tcp        0      0 <host_ip>:9835     <satellite_ip>:5647       ESTABLISHED 0          3219928    33676/python         off (0.00/0/0)
  • We can see that netstat from Satellite shows 2 established connections from the affected host:
tcp        0      0 <satellite_ip>:5647       <host_ip>:20459    ESTABLISHED 497        40382      2578/qdrouterd       off (0.00/0/0)
tcp        0      0 <satellite_ip>:5647       <host_ip>:9835     ESTABLISHED 497        58291453   2578/qdrouterd       off (0.00/0/0)
  • This duplicate connection can also be seen in qpid-stat (see the 'cons' column):
# ./sos_commands/katello/qpid-stat_-q_--ssl-certificate_.etc.pki.pulp.qpid.client.crt_-b_amqps_..localhost_5671
queue                                               dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
...
...
pulp.agent.dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a     Y                      0     1      1       0    674      674         2     1
...
  • Looking at foreman-ssl_access_ssl.log we can see that there are 2 systems reporting with the same uuid:
<host1_ip> - - [13/Jun/2020:23:50:33 +0200] "GET /rhsm/consumers/dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a/compliance HTTP/1.1" 200 3775 "-" "RHSM/1.0 (cmd=rhsmcertd-worker)"
<host2_ip> - - [14/Jun/2020:03:25:27 +0200] "GET /rhsm/consumers/dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a/compliance HTTP/1.1" 200 3775 "-" "RHSM/1.0 (cmd=rhsmd)"
  • Now lets run subscription-manager identity on both content hosts:
[root@host1 ~]# subscription-manager identity
system identity: dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a
name: host1.example.com
org name: example
org ID: example
environment name: example/example
[root@host2 ~]# subscription-manager identity
system identity: dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a
name: host2.example.com
org name: example
org ID: example
environment name: example/example
SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.