Fluentd fails to send logs with unknown property

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform
    • 3.X

Issue

  • Fluentd fills up with illegal argument exceptions due to "id" or other custom properties
2018-06-14 14:33:21 +0200 [debug]: Elasticsearch errors returned, retrying:  {"took"=>2, "errors"=>true, "items"=>[{"create"=>{"_index"=>"project.test-prod.286e429c-8efd-11e6-b4e5-0050569a3c52.2018.06.14", "_type"=>"com.redhat.viaq.common", "_id"=>"ZmU3YTE5NzYtMzIwZi00Njg1LTlmZTMtOTViN2Q1YjU4NzVi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [res.body]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"unknown property [id]"}}}}, . . . "status"=>409, "error"=>{"type"=>"document_already_exists_exception", "reason"=>"[com.redhat.viaq.common][NTdhMGVhY2QtYTVmYy00MzQyLTk3YTQtZjJjMWU2YzQxYjBi]: document already exists", "shard"=>"0", . . .

2018-06-14 14:33:21 +0200 [warn]: temporarily failed to flush the buffer. next_retry=2018-06-14 14:38:21 +0200 error_class="Fluent::ElasticsearchErrorHandler::ElasticsearchError" error="Elasticsearch returned errors, retrying. Add '@log_level debug' to your config to see the full response" plugin_id="object:3ff3d4816f0c"
  2018-06-14 14:33:21 +0200 [warn]: suppressed same stacktrace

Resolution

In older versions of the images, this can block fluentd entirely as it will not separate "broken" messages from "normal" messages. It is suggested to update logging images to v3.9.30 or later so that other logs should be able to be sent.

To fix the root issue, it is suggested to investigate the logs produced by applications; any key name which is reused in the fluentd schema within a namespace that has different variable types (string, int, boolean, etc) will cause the error.

It is also suggested to disable MERGE_JSON_LOG. This is an option that allows key:value pairs in the JSON-formatted log messages to be indexed as metadata in Elasticsearch itself. In later versions this is disabled by default, as it can cause issues when key types conflict, as well as other situations. In 3.9+ this can be disabled with:

# oc set env ds/logging-fluentd MERGE_JSON_LOG=false --overwrite=true

In 3.7, it is required to follow another process. See this related solution.

Root Cause

This occurs when fluentd sends records with a shared key but different type of value (like string, int, bool, etc). This was reported in This content is not included.BZ#1591468

Diagnostic Steps

Set up fluentd debug logging. First, edit the configmap:

# oc edit cm logging-fluentd

Comment out the following line and add:

    #@include configs.d/openshift/system.conf
    <system>
      log_level debug
    </system>

Then kill the existing fluentd pods:

# oc delete pod -l logging-infra=fluentd

Then allow some time to pass so that the pods hit the same error. The logs should show what the elasticsearch error is.

This issue is part of a family of solutions related to the Elasticsearch errors returned message. See more here

SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.