SearchGuard Not Yet Initialized in OpenShift Logging Stack

Solution Verified - Updated 14 Jun 2024

Environment

Red Hat OpenShift Container Platform
- 3.x
EFK stack
- 3.4.1
- 3.5.1+

Issue

Kibana reports "Unable to connect to elasticsearch at https://logging-es:9200"
Searchguard not initialized messages in the logging-es pod logs:

[INFO ][index.store              ] [Jens Meilleur Slap Shot] [project.example.bfc38003-c309-11e6-b098-0050568f5a25.2017.05.28][0] Failed to open / find files while reading metadata snapshot
[ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[WARN ][index.store              ] [Jens Meilleur Slap Shot] [project.example.bfc38003-c309-11e6-b098-0050568f5a25.2017.05.26][0] failed to build store metadata. checking segment info integrity (with commit [no])
...
[ERROR][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [Trapper] Timeout java.util.concurrent.TimeoutException: Timeout after 30SECONDS while retrieving configuration for [config, roles, rolesmapping, internalusers, actiongroups](index=.searchguard.logging-es-06coa0un-6-4znyf)
java.util.concurrent.TimeoutException: Timeout after 30SECONDS while retrieving configuration for [config, roles, rolesmapping, internalusers, actiongroups](index=.searchguard.logging-es-06coa0un-6-4znyf)
...
[ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized

Messages like Timeout after 30SECONDS

[2017-06-28 15:03:17,711][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized (you may need to run sgadmin)

Messages like Failure No shard available

java.util.concurrent.TimeoutException: Timeout after 30SECONDS while retrieving configuration for [roles, rolesmapping](index=.searchguard)
[ERROR][i.f.e.p.a.ConfigurationLoader] Failure No shard available for [org.elasticsearch.action.get.MultiGetShardRequest@4c256fc] retrieving configuration for [roles, rolesmapping] (index=.searchguard)

Resolution

Before you execute the lines below please set the OpenShift logging namespace.

# Before 3.10
export OC_LOGGING=logging

# since 3.10
export OC_LOGGING=openshift-logging

Delete all SearchGuard indices related to the running elasticsearch pods: Run the following commands in each elasticsearch pod (use component=es-ops for the ops version)

for pod in `oc -n ${OC_LOGGING} get pods --selector=component=es -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'`
do
  echo Deleting SG index for $pod
  oc -n ${OC_LOGGING} exec $pod -- curl --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca -XDELETE https://localhost:9200/.searchguard.$pod
done

This will delete the SearchGuard indices from the already-terminated pod, allowing the cluster to initialize properly. To get all SearchGuard indices:

# anypod=$(oc -n ${OC_LOGGING} get pods --selector=component=es -o jsonpath='{.items[0].metadata.name}')
# oc -n ${OC_LOGGING} exec $anypod -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://logging-es:9200/_cat/indices?v | grep searchguard > sg_indices.out

Scale down ES pods
Scale up ES pods
Delete FluentD pods

If the above does not work, re-starting the SearchGuard initialization can help the ES cluster re-sync.

# oc -n openshift-logging rsh <elasticsearch pod>
# es_seed_acl

In older versions, the above command is not available. In that case, run the below in every ElasticSearch pod, simultaneously:

# oc -n openshift-logging rsh <elasticsearch pod>
# /usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh \
        -cd ${HOME}/sgconfig \
        -i .searchguard.${HOSTNAME} \
        -ks /etc/elasticsearch/secret/searchguard.key \
        -kst JKS \
        -kspass kspass \
        -ts /etc/elasticsearch/secret/searchguard.truststore \
        -tst JKS \
        -tspass tspass \
        -nhnv \
        -icl

In version 3.9 sgadmin.sh is located in /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh so may be necessary to run 'find' to locate it on your particular OCP version.
It is possible that it will require running this script multiple times.
In the later versions, also you find these details mentioned above working.

Root Cause

The SearchGuard index is part of the infrastructure of the logging cluster; these indices get thrown away and rebuilt when the pod respawns, and do not contain any application data. Although ElasticSearch views this as blocking the cluster from starting up, in reality this is likely just an index that should have deleted but did not when the pod terminated.

This is being This content is not included.investigated for ways to prevent this desyncing.

Errors of type Timeout after 30SECONDS means that the query took too long to complete. When loading the Kibana Discovery page a query with all the user information is retrieved and this is taken from the SearchGuard index. A timeout can be caused by many reasons, for example:

Big load, due to a data recovery for example
Lack of resources (memory, cpu)
Network problem

Diagnostic Steps

Before you execute the lines below please set the OpenShift logging namespace.

# Before 3.10
export OC_LOGGING=logging

# since 3.10
export OC_LOGGING=openshift-logging

Run ElasticSearch cluster information queries

# oc -n ${OC_LOGGING} exec <ES PODNAME> -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/health?v
# oc -n ${OC_LOGGING} exec <ES PODNAME> -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://logging-es:9200/_cat/indices?v

Check for any red indices

green  open   .searchguard.logging-es-m50k78tj-6-9oydw                                               1   2          3            0       42kb           14kb
green  open   .kibana.968528fe24c0a4b52766850041d6b0722917f563                                       1   2          9            0    105.5kb         35.1kb
red    open   .searchguard.logging-es-06coa0un-6-7kc0b                                               1   2
green  open   project.management-infra.54281e56-1c5a-11e6-a509-0050568f4c2d.2017.06.05               1   2         53            0    342.3kb        114.1kb
green  open   project.example.678ce11f-c309-11e6-b098-0050568f5a25.2017.06.09                     1   2        184            0    413.5kb        137.3kb

Note the red index. If the red index is a searchguard index, proceed to Resolution step. If the red index is an application index, deleting it can cause loss of application logs.
Another possibility is that the SearchGuard indices don't have all the information required for the users authentication. Usually a SearchGuard index might contain 5 documents. See this example of a healthy and populated index.

green  open   .searchguard.logging-es-smeuexjr-2-qmc16                   1   2          5            0       66kb           66kb

SBR

Shift

Product(s)

Red Hat OpenShift Container Platform

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.