Aggregated Logging - Data is not replicated

Solution Verified - Updated 14 Nov 2019

Environment

OpenShift 3.5 and later

Issue

After installing the Aggregated Logging Framework on OpenShift with more than one Elasticsearch node, data has one copy only.

Resolution

Edit the inventory file and set the number of shards and replicas you want Elasticsearch to configure for your indices.

Example:

openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=2

After that, run the ansible playbook to update the configuration.

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

This operation will involve a downtime of the logs aggregation and will not require any other action. Existing indices will be configured with the new values.

NOTE: For OpenShift 3.3 and 3.4 the default values are the same but the auto-expand feature is configured which will automatically increase the number of replicas depending on the available nodes.

index:
  number_of_shards: 1
  number_of_replicas: 0
  auto_expand_replicas: 0-3

Changing the number of replicas to existing indices

After changing the number of replicas in either the inventory or in the configmap, the change will only affect newly created indices.

health status index                                                           pri rep docs.count docs.deleted store.size pri.store.size 
green  open   project.logging.a439b3dc-7acc-11e7-81e0-fa163e44925b.2017.08.07   1   0        926            0    405.3kb        405.3kb 
green  open   project.logging.a439b3dc-7acc-11e7-81e0-fa163e44925b.2017.08.08   1   2        12             0    25.9kb        25.9kb

In order to align the existing indices with the new number of replicas the following query has to be done:

$ oc exec $anypod -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca -XPUT https://localhost:9200/*/_settings -d '{ "index" : { "number_of_replicas" : 2 } }'

Root Cause

De default values for these attributes are the following:

openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=0

This means that indices won't be split and will only have 1 primary shard which is acceptable for this type of indices because each index contains only the data of one day.

On the other hand, the default value for the number of replicas is 0 meaning that there won't be replica shards for each primary shard. This implies there won't be any data replication. Replication is important for two primary reasons:

It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

Depending on your requisites and storage availability you may not afford or need any replication as it implies more storage. In case you want to enable data replication you might want to configure the number of replicas to either 1 or 2.

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Category

Configure

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.