Why does rsyslog stop logging locally when the remote log server cannot be reached?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6, 7, 8, and 9
  • Rsyslog with Queuing Enabled (Queued Mode)
  • Rsyslog using 1 remote log host

Issue

  • Rsyslog stops logging locally when the remote log server cannot be reached
  • Rsyslog exhibits performance degradation when remote log server is unreachable
  • Rsyslog clients behave erratically when unable to establish TCP connection to remote log server (despite using queues)
  • The server rebooted automatically and the vmcore was generated. However no logs around the time of reboot were found.

Resolution

  • Create a disk-assisted in-memory queue for the rsyslog action queues, this will provide the best benefits of reliability and speed
  • Below is an example configuration involving queuing parameters that should resolve the latency issue seen locally in rsyslog when a remote log server cannot be reached
/etc/rsyslog.conf

# stripped down configuration sample
# client rsyslog system

# we want to use additional disk queue as a buffer for
# the fast in-memory queue if an action cannot be executed for an
# extended period of time
# reference:
# http://www.rsyslog.com/doc/rsyslog_conf_global.html
# http://www.rsyslog.com/doc/queues.html

# unrelated common configuration follows
$ModLoad imuxsock
$ModLoad imklog

$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IncludeConfig /etc/rsyslog.d/*.conf

*.info;mail.none;authpriv.none;cron.none                /var/log/messages
authpriv.*                                              /var/log/secure
mail.*                                                  -/var/log/maillog
*.emerg                                                 *
uucp,news.crit                                          /var/log/spooler
local7.*                                                /var/log/boot.log

# work directory definition is needed for the disk queue, any writable directory would do
$WorkDirectory /var/lib/rsyslog
# the (in-memory) action queue itself
$ActionQueueType LinkedList
# the disk queue needs a file name prefix for spool files
# this also states that the in-memory queue will have a disk queue associated with it
$ActionQueueFileName fwd1
#$ActionQueueSaveOnShutdown on    # optional
# this prevents messages being dropped upon failed action
$ActionResumeRetryCount -1
$ActionQueueSize 1000 # the default
# these two directives determine when the disk queue comes into play
$ActionQueueLowWaterMark 2000 # the default
$ActionQueueHighWaterMark 8000 # the default
# discard messages when there's not enough room for them. there's a
# bug in the default value being to high
$ActionQueueDiscardMark 9750 # the default
# there are various timeouts that can be tweaked, e.g.:
# drop messages that can't be enqueued in time
$ActionQueueTimeoutEnqueue 2000 # timeout in ms  [1000ms is 1sec, default 2000, 0 means indefinite]
# and the action itself
*.* @@xxx.xxx.xxx.xxx:514

Root Cause

  • Local logging can be impacted if configuring rsyslog to send remote logs without using any on-disk queuing, Without this the daemon handles all output together and if the remote connection has issues then all local logging hangs.
  • $ActionResumeRetryCount -1 causes rsyslog to try an infinite amount of times which will eventually fill the rsyslog in-memory causing local logging performance to degrade

Diagnostic Steps

  • Timestamps for local logs on rsyslog client are incorrect

  • Local logging on the rsyslog client is delayed or stops all together

  • /var/log/message activity is seen locally while a connection is present to the remote logger server:

# tail -f /var/log/messages

Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
Mar  7 16:38:50 localhost root: ~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|qwertyuiop[]\ASDFGHJKL:"asdfghjkl;'ZXCVBNM<>?zxcvbnm,./
  • /var/log/message activity stops locally as well as remotely when the remote log server connection is dropped:
# tail -f /var/log/messages

Thu Mar  7 16:28:50 JST 2013




Thu Mar  7 16:28:51 JST 2013




Thu Mar  7 16:28:52 JST 2013
  • Rsyslog strace during the remote log server failure:
<0.000008>
15913 16:48:56 <... futex resumed> )    = 0 <0.001214>
15914 16:48:56 futex(0x7f1919a225b0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
15913 16:48:56 futex(0x7f1919a225b0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
15914 16:48:56 <... futex resumed> )    = 0 <0.000018>
15913 16:48:56 <... futex resumed> )    = -1 EAGAIN (Resource temporarily unavailable) <0.000018>
15914 16:48:56 select(1, [0], NULL, NULL, NULL <unfinished ...>
15913 16:48:56 futex(0x7f1919a225b0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000008>
15913 16:48:56 gettimeofday({1362642536, 687797}, NULL) = 0 <0.000006>
15913 16:48:56 write(1, "Mar  7 16:48:56 localhost root: "..., 127) = 127 <0.000010>
15913 16:48:56 gettimeofday({1362642536, 687860}, NULL) = 0 <0.000006>
15913 16:48:56 futex(0x7f1919a2a4c4, FUTEX_WAIT_PRIVATE, 3833, NULL <unfinished ...>
15914 16:48:56 <... select resumed> )   = 1 (in [0]) <0.001203>
15914 16:48:56 recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<13>Mar  7 16:48:56 root: ~!@#$%"..., 2048}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=19900, uid=0, gid=0}}, msg_flags=0}, MSG_DONTWAIT) = 120 <0.000008>
15914 16:48:56 gettimeofday({1362642536, 689034}, NULL) = 0 <0.000006>
15914 16:48:56 futex(0x7f1919a2a4c4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f1919a2a4c0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.