Remote Execution jobs invoked around 3am get stuck forever

Solution Verified - Updated

Environment

Red Hat Satellite 6.11 or older

Issue

We invoke job regularly through all the day. Most of them finish gracefully, but jobs invoked aroung 3am or 3:30am get stuck forever in running/pending state, with Actions::ProxyAction dynflow step being suspended forever.

Resolution

For a workaround, edit /etc/logrotate.d/foreman-proxy by adding --kill-who=main as follows:

{
..
  postrotate
    /bin/systemctl kill --kill-who=main --signal=SIGUSR1 foreman-proxy >/dev/null 2>&1 || true
  endscript
}

The change must be done on Satellite and each and every external Capsule that has REX feature enabled.

Be aware that updating foreman-proxy package (by applying most of Satellite upgrades or updates), the change will be reverted back and you must apply it again.

For a final fix, Upgrade to Satellite 6.13 or later

For more KB articles/solutions related to Red Hat Satellite 6.x Remote Execution Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Remote Execution Issues.

Root Cause

  • When a REX job is in progress, having opened ssh connection to the target Host, and daily logrotate is kicked off, it sends SIGUSR1 signal to foreman-proxy service and all its child processes as well. Sadly, the ssh process reacts by terminating itself without properly letting know its parent what happened. foreman-proxy is then waiting forever for an update from already terminated child.
SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.