After reboot, new indices not creating in Elasticsearch, but reporting as healthy
Environment
- Red Hat OpenShift Container Platform
- 3.9
Issue
- Elastic search index not generated for customer projects since when the servers were rebooted
- Fluentd is reporting:
2019-xx-30 14:06:24 +0100 [warn]: temporarily failed to flush the buffer. next_retry=2019-03-30 14:11:24 +0100 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="Elasticsearch returned errors, retrying. Add '@log_level debug' to your config to see the full response" plugin_id="object:3fd1f41a29fc"
- Lots of stale buffer files found int he fluentd pods
sh-4.2# find . |grep -i fluent
./pods/1aff02ec-xxxx-xxxx-xxxx-005056bf561e/fluentd-elasticsearch_0.log
./containers/logging-fluentd-rc5b8_logging_fluentd-elasticsearch-442d4e3a44b187759799b2a7c895ca675736a0dfe9e48609dc85b4564822bd51.log
sh-4.2# tail -f ./pods/1aff02ec-xxxx-xxxx-xxxx-005056bf561e/fluentd-elasticsearch_0.log
{"log":"2019-04-02 12:30:43 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:43.0055154Z"}
{"log":"2019-04-02 12:30:44 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:44.005773785Z"}
{"log":"2019-04-02 12:30:45 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:45.006026623Z"}
{"log":"2019-04-02 12:30:46 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:46.006271458Z"}
{"log":"2019-04-02 12:30:47 +0200 [debug]: buffer queue is full. Wait 1 second to re-emit events\n","stream":"stdout","time":"2019-04-02T10:30:47.006527686Z"}
- But Elasticsearch is healthy and/or receiving new logs and creating indices from other nodes or components
Resolution
- Removing stale buffer files.
# oc rsh logging-fluentd-zmh92
sh-4.2# rm /var/lib/fluentd/buffer-output-es-config.output_tag.*.log
sh-4.2# exit
- Refresh fluentd pods
# oc delete pods -l component=fluentd
pod "logging-fluentd-2rn9g" deleted
pod "logging-fluentd-8dvkp" deleted
pod "logging-fluentd-jddtw" deleted
Diagnostic Steps
- Check for stale buffers
# oc rsh logging-fluentd-xxxxx
sh-4.2# cd /var/lib/fluentd/
sh-4.2# ls -lrt
total 266428
-rw-r--r--. 1 root root 4383412 Jan 31 15:08 buffer-output-es-config.output_tag.q580c189613c8fdce.log
-rw-r--r--. 1 root root 8387921 Jan 31 18:01 buffer-output-es-config.output_tag.q580c18f1e6f9be52.log
-rw-r--r--. 1 root root 8387706 Jan 31 20:53 buffer-output-es-config.output_tag.q580c3fb4a60d3269.log
. . .
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.