Change number of shards for OpenShift Elasticsearch index template

Updated

Intro

These instructions are primarily for OpenShift logging but should apply to any Elasticsearch installation by removing the OpenShift specific bits. They also apply to Elasticsearch 2.x for OpenShift 3.4 -> 3.10, so may require some tweaking to work with ES 5.x, starting on OpenShift 3.11. Some steps may not work on OpenShift 3.11. The instructions assume your logging namespace is logging - use openshift-logging with OpenShift 3.10 and later.

The default number of shards per index for OpenShift logging is 1, which is by design not to break very large deployments with a large number of indices, where the problem is having too many shards. However, for deployments with a small number of very large indices, this can be problematic. Elasticsearch recommends keeping shard size under 50GB, so increasing the number of shards per index can help with that.

This guide will discuss how to change the primary shards for a class of indices. To instead change the default primary shard count for ALL indices, refer to the OpenShift Logging documentation for the Ansible variable, openshift_logging_es_number_of_shards

Steps

Identify the index pattern you want to increase sharding for. For OpenShift logging this will be .operations.* or project.*. If there are specific projects that typically generate much more data than others, and you need to keep the number of shards down, you can shard very specific patterns e.g. project.this-project-generates-too-many-logs.*. If you don’t anticipate having many namespaces/project/indices, you can just use project.*.

Create a JSON file for each index pattern.

Call this one more-shards-for-operations-indices.json.

{
    "order": 20,
    "settings": {
        "index": {
            "number_of_shards": 3
        }
    },
    "template": ".operations.*"
}

Call this one more-shards-for-project-indices.json.

{
    "order": 20,
    "settings": {
        "index": {
            "number_of_shards": 3
        }
    },
    "template": "project.*"
}

Load these into Elasticsearch. You’ll need the name of one of the Elasticsearch pods:

# oc get -n logging pods -l component=es

Pick one and call it $espod.

# espod=logging-es-xxxxxxxxx

If you have a separate OPS cluster, you’ll need to identify one of the es-ops Elasticsearch pods too, for the .operations.* indices:

# oc get -n logging pods -l component=es-ops

Pick one and call it $esopspod.

# esopspod=logging-es-ops-xxxxxxxx

Load the file more-shards-for-project-indices.json into $espod:

# file=more-shards-for-project-indices.json
# cat $file | \
# oc exec -n logging -i -c elasticsearch $espod -- \
    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/$file -XPUT -d@- | \
python -mjson.tool

Load the file more-shards-for-operations-indices.json into $esopspod, or $espod if you do not have a separate OPS cluster:

# file=more-shards-for-operations-indices.json
# cat $file | \
# oc exec -n logging -i -c elasticsearch $esopspod -- \
    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/$file -XPUT -d@- | \
python -mjson.tool

NOTE The settings will not apply to existing indices. You would need to perform a reindexing for that to work. However, it is usually not a problem, as the settings will apply to new indices, and curator will eventually delete the old ones.

Results

To see if this is working, wait until new indices are created, and use the _cat endpoints to view the new indices/shards:

# oc exec -c elasticsearch $espod -- \
# curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_cat/indices?v
health status index                                                                        pri rep docs.count docs.deleted store.size pri.store.size 
green  open   project.kube-service-catalog.d5dbe052-903c-11e8-8c22-fa163e6e12b8.2018.07.26   3   0       1395            0      2.2mb          2.2mb

The pri value is now 3 instead of the default 1. This means there are 3 shards for this index. You can also check the shards endpoint:

# oc exec -c elasticsearch $espod -- \
# curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_cat/shards?v
index                                                                        shard prirep state     docs   store ip         node                            

project.kube-service-catalog.d5dbe052-903c-11e8-8c22-fa163e6e12b8.2018.07.26 1     p      STARTED    596 683.3kb 10.131.0.8 logging-es-data-master-vksc2fwe 
project.kube-service-catalog.d5dbe052-903c-11e8-8c22-fa163e6e12b8.2018.07.26 2     p      STARTED    590 652.6kb 10.131.0.8 logging-es-data-master-vksc2fwe 
project.kube-service-catalog.d5dbe052-903c-11e8-8c22-fa163e6e12b8.2018.07.26 0     p      STARTED    602 628.1kb 10.131.0.8 logging-es-data-master-vksc2fwe

This lists the 3 shards for the index. If you have multiple Elasticsearch nodes, you should see more than one node listed in the node column of the _cat/shards output.

SBR
Category
Components
Article Type