Remote Execution jobs invoked around 3am get stuck forever
Environment
Red Hat Satellite 6.11 or older
Issue
We invoke job regularly through all the day. Most of them finish gracefully, but jobs invoked aroung 3am or 3:30am get stuck forever in running/pending state, with Actions::ProxyAction dynflow step being suspended forever.
Resolution
For a workaround, edit /etc/logrotate.d/foreman-proxy by adding --kill-who=main as follows:
{
..
postrotate
/bin/systemctl kill --kill-who=main --signal=SIGUSR1 foreman-proxy >/dev/null 2>&1 || true
endscript
}
The change must be done on Satellite and each and every external Capsule that has REX feature enabled.
Be aware that updating foreman-proxy package (by applying most of Satellite upgrades or updates), the change will be reverted back and you must apply it again.
For a final fix, Upgrade to Satellite 6.13 or later
- This content is not included.REX task running during logrotate to foreman-proxy goes to suspended state forever
- RHBA-2020:3255 - Bug Fix Advisory
For more KB articles/solutions related to Red Hat Satellite 6.x Remote Execution Issues, please refer to the Red Hat Satellite Consolidated Troubleshooting Article for Red Hat Satellite 6.x Remote Execution Issues.
Root Cause
- When a REX job is in progress, having opened
sshconnection to the target Host, and dailylogrotateis kicked off, it sendsSIGUSR1signal toforeman-proxyservice and all its child processes as well. Sadly, thesshprocess reacts by terminating itself without properly letting know its parent what happened.foreman-proxyis then waiting forever for an update from already terminated child.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.