After reboot, new indices not creating in Elasticsearch, but reporting as healthy

Solution Verified - Updated 14 Jun 2024

Environment

Red Hat OpenShift Container Platform
- 3.9

Issue

Elastic search index not generated for customer projects since when the servers were rebooted
Fluentd is reporting:

2019-xx-30 14:06:24 +0100 [warn]: temporarily failed to flush the buffer. next_retry=2019-03-30 14:11:24 +0100 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="Elasticsearch returned errors, retrying. Add '@log_level debug' to your config to see the full response" plugin_id="object:3fd1f41a29fc"

Lots of stale buffer files found int he fluentd pods

sh-4.2# find . |grep -i fluent
./pods/1aff02ec-xxxx-xxxx-xxxx-005056bf561e/fluentd-elasticsearch_0.log
./containers/logging-fluentd-rc5b8_logging_fluentd-elasticsearch-442d4e3a44b187759799b2a7c895ca675736a0dfe9e48609dc85b4564822bd51.log

sh-4.2# tail -f ./pods/1aff02ec-xxxx-xxxx-xxxx-005056bf561e/fluentd-elasticsearch_0.log
{"log":"2019-04-02 12:30:43 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:43.0055154Z"}
{"log":"2019-04-02 12:30:44 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:44.005773785Z"}
{"log":"2019-04-02 12:30:45 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:45.006026623Z"}
{"log":"2019-04-02 12:30:46 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:46.006271458Z"}
{"log":"2019-04-02 12:30:47 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:47.006527686Z"}

But Elasticsearch is healthy and/or receiving new logs and creating indices from other nodes or components

Resolution

Removing stale buffer files.

# oc rsh logging-fluentd-zmh92
sh-4.2# rm /var/lib/fluentd/buffer-output-es-config.output_tag.*.log
sh-4.2# exit

Refresh fluentd pods

# oc delete pods -l component=fluentd
pod "logging-fluentd-2rn9g" deleted
pod "logging-fluentd-8dvkp" deleted
pod "logging-fluentd-jddtw" deleted

Diagnostic Steps

Check for stale buffers

# oc rsh logging-fluentd-xxxxx
sh-4.2# cd /var/lib/fluentd/
sh-4.2# ls -lrt
total 266428
-rw-r--r--. 1 root root 4383412 Jan 31 15:08 buffer-output-es-config.output_tag.q580c189613c8fdce.log
-rw-r--r--. 1 root root 8387921 Jan 31 18:01 buffer-output-es-config.output_tag.q580c18f1e6f9be52.log
-rw-r--r--. 1 root root 8387706 Jan 31 20:53 buffer-output-es-config.output_tag.q580c3fb4a60d3269.log
. . .

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Components

Fluentd

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.