[Satellite 6] Remote jobs fail with 'The task has been cancelled. Is katello-agent installed and goferd running on the Host?'
Environment
- Satellite 6
- katello-agent/goferd
Issue
- Execution of a remote job on a content host fails with:
Host did not respond within 20 seconds. The task has been cancelled. Is katello-agent installed and goferd running on the Host?
Resolution
- Re-register one of the content hosts.
For more KB articles/solutions related to Red Hat Satellite 6.x Remote Execution Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Remote Execution Issues
Root Cause
- There are 2 content hosts that share the same consumer uuid.
Diagnostic Steps
- When trying to update the errata with hammer the following error appears:
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.142017 +0200 SERVER (info) [1529]: Connection from <host_ip_address>:40450 (to :5647) failed: amqp:connection:framing-error SSL Failure: Unknown error
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.797305 +0200 SERVER (info) [2596]: Accepted connection to :5647 from <host_ip_address>:12886
Jun 22 09:44:54 satellite qdrouterd: 2020-06-22 09:44:54.861825 +0200 SERVER (info) [2597]: Accepted connection to :5647 from <host_ip_address>:12888
- We can see that netstat from the affected host shows just one established connection to Satellite on port 5647:
tcp 0 0 <host_ip>:9835 <satellite_ip>:5647 ESTABLISHED 0 3219928 33676/python off (0.00/0/0)
- We can see that netstat from Satellite shows 2 established connections from the affected host:
tcp 0 0 <satellite_ip>:5647 <host_ip>:20459 ESTABLISHED 497 40382 2578/qdrouterd off (0.00/0/0)
tcp 0 0 <satellite_ip>:5647 <host_ip>:9835 ESTABLISHED 497 58291453 2578/qdrouterd off (0.00/0/0)
- This duplicate connection can also be seen in qpid-stat (see the 'cons' column):
# ./sos_commands/katello/qpid-stat_-q_--ssl-certificate_.etc.pki.pulp.qpid.client.crt_-b_amqps_..localhost_5671
queue dur autoDel excl msg msgIn msgOut bytes bytesIn bytesOut cons bind
...
...
pulp.agent.dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a Y 0 1 1 0 674 674 2 1
...
- Looking at
foreman-ssl_access_ssl.logwe can see that there are 2 systems reporting with the same uuid:
<host1_ip> - - [13/Jun/2020:23:50:33 +0200] "GET /rhsm/consumers/dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a/compliance HTTP/1.1" 200 3775 "-" "RHSM/1.0 (cmd=rhsmcertd-worker)"
<host2_ip> - - [14/Jun/2020:03:25:27 +0200] "GET /rhsm/consumers/dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a/compliance HTTP/1.1" 200 3775 "-" "RHSM/1.0 (cmd=rhsmd)"
- Now lets run
subscription-manager identityon both content hosts:
[root@host1 ~]# subscription-manager identity
system identity: dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a
name: host1.example.com
org name: example
org ID: example
environment name: example/example
[root@host2 ~]# subscription-manager identity
system identity: dfcf8505-1ea7-4c6d-80ad-22d4e2ef6a1a
name: host2.example.com
org name: example
org ID: example
environment name: example/example
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.