Aggregated Logging - Data is not replicated
Environment
- OpenShift 3.5 and later
Issue
After installing the Aggregated Logging Framework on OpenShift with more than one Elasticsearch node, data has one copy only.
Resolution
Edit the inventory file and set the number of shards and replicas you want Elasticsearch to configure for your indices.
Example:
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=2
After that, run the ansible playbook to update the configuration.
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
This operation will involve a downtime of the logs aggregation and will not require any other action. Existing indices will be configured with the new values.
NOTE: For OpenShift 3.3 and 3.4 the default values are the same but the auto-expand feature is configured which will automatically increase the number of replicas depending on the available nodes.
index:
number_of_shards: 1
number_of_replicas: 0
auto_expand_replicas: 0-3
Changing the number of replicas to existing indices
After changing the number of replicas in either the inventory or in the configmap, the change will only affect newly created indices.
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open project.logging.a439b3dc-7acc-11e7-81e0-fa163e44925b.2017.08.07 1 0 926 0 405.3kb 405.3kb
green open project.logging.a439b3dc-7acc-11e7-81e0-fa163e44925b.2017.08.08 1 2 12 0 25.9kb 25.9kb
In order to align the existing indices with the new number of replicas the following query has to be done:
$ oc exec $anypod -- curl -s --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca -XPUT https://localhost:9200/*/_settings -d '{ "index" : { "number_of_replicas" : 2 } }'
Root Cause
De default values for these attributes are the following:
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=0
This means that indices won't be split and will only have 1 primary shard which is acceptable for this type of indices because each index contains only the data of one day.
On the other hand, the default value for the number of replicas is 0 meaning that there won't be replica shards for each primary shard. This implies there won't be any data replication. Replication is important for two primary reasons:
- It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
- It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
Depending on your requisites and storage availability you may not afford or need any replication as it implies more storage. In case you want to enable data replication you might want to configure the number of replicas to either 1 or 2.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.