DG 8 operation in case of OCP nodes crashing/upgrade

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

  • In case an OCP worker nodes with a pod of DG 8 crashes, what happens to the cluster?
  • In case an OCP worker nodes with a pod of DG 8 are upgrading, what happens to the cluster?

Resolution

This solution is about OCP node crashing, for DG pods crashing, see this solution.

Config listener pod crash
Config-listener is the tattletale pod and it will show the cache configurations being applied.
In case of restart, the config listener makes it look for the outdated caches.
See Config Listener Pod.

Operator pod crash
If the Operator pod crashes because the host node becomes available, the semantics of a Kubernetes' Deployment should ensure that an Operator pod is rescheduled on an available node. A running DG cluster does not depend on the Operator to function once the Infinispan CR/Cache CR has been reconciled, so the operator being down should not affect Infinispan's cluster service availability.

OCP Upgrade impact on DG cluster vs DG Operator pod

  • About DG cluster (dg pods) impact itself: The upgrade of OCP nodes happen in a rotating manner and the cluster, provided enough pods (redundancy) won't be impacted because not all pods will go down, and the ones that do go down, will be re-spawned in another node that is either nor upgrading or upgraded already.
  • About DG operator pod impact itself: The upgrade of OCP nodes happen in a rotating manner and the DG operator pod can be impacted by the node upgrading, however it will be re-spin. The operator of the Infinispan cluster won't be affected by the DG Operator pod going down.
  • About DG operator installation/upgrade impact The upgrade of OCP nodes happen in a rotating manner and then the installation/upgrade can be done without major issues.

If the crash was on the worker node that hosts the operator pod?
A running DG cluster does not depend on the Operator to function once the Infinispan CR has been reconciled, so the operator being down should not affect service availability.

Is RelayNodes the same as GossipRouter pods?
No, there is only a single gossip router pod per site. What maxRelayNodes does is promote more infinispan pods to relay nodes. The relay nodes are the only one that connect with GossipRouter, they are the bridge.
So if a crash happens on the relay node, aka site master, another pod (or node) will be promoted as site master.

If the crash was on the worker node that hosts the router pod - for Cross-Site configuration/operation?
Another router pod is spawned as a Deployment and the operation should resume.
JGroups only needs a single GossipRouter and if one of the sites crashes, it uses the other site GossipRouter.
And there is reconnect interval (a couple of seconds) that tries to re-establish the connection.

What if a crash happens on the GossipRouter pod?
JGroups only needs a single GossipRouter and if one of the sites crashes, it uses the other site GossipRouter.
Following the diagram shows Tunnel protocol usage:

custer1Node -> cluster1Master -> GossipRouter -> cluster2Master -> cluster2OtherNode
which is the same as:
cluster1Node -> cluster1Relay1 -> GossipRouter -> cluster1Relay2 -> cluster2OtherNode

Whereas without TUNNEL just remove the step in the middle,

custer1Node -> cluster1Master -> cluster2Master -> cluster2OtherNode

What if a crash happens on the MaxRelay pod?
maxRelayNodes or Xsite Master are nodes that can send xsite messages, in case the default relay node crashes, so then another pod should be spawned. Also, one can specify the number of pods that can send RELAY messages with the service.sites.local.maxRelayNodes field. Although possible a higher value of maxRelayNodes will not prevent from inconsistence during crash.
If an entry has been send to a relay node and this node will crash after that point the entry is not send to the backup. The only reason to have more relay nodes is to split the traffic if one node can not handle it.

If the crash was on the worker node that hosts the operator pod?
Another pod should be spawned and the cluster does not need the operator to operate (after being formed and in nominal operation).

Set anti-affinity rules in Infinispan's CR
The following is the anti-affinity strategy that Infinispan Operator uses if you do not configure the spec.affinity field in your Infinispan CR:

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: infinispan-pod
              clusterName: <cluster_name>
              infinispan_cr: <cluster_name>

Scenarios that could explain the pods spawning in the same node:

  1. That is the only (worker) node available
  2. And if all other nodes were full enough that the pod could not be spun up elsewhere (large ram, but could also be cpu need);
    The above must be investigated via node level utilization;

In case OCP nodes crash, what are the troubleshooting steps
This is outside DG operation, but look at MCD/kubelet logging to see what happened before the crash:
machine-config-daemon logs are from the container logging: oc logs -f kubelet logging is from the systemd unit logging on the node

For more information on machine-config-daemon see This page is not included, but the link has been rewritten to point to the nearest parent document.OCP documentation 4.9.

In case OCP nodes crash, how long it will take to reschedule them/bring them back up?
If the OCP node crashes, the pod should be scheduled in another available OCP Node almost immediately. However, factors such as affinity/anti-affinity play a role in the speed of the pods coming back up because the affinity/anti-affinity impacts the OCP node that those pods are being scheduled on.
If the pods are not rescheduled when the node becomes ready, see the solution Pods are not rescheduled when node becomes NotReady.

In case DG controller pod crashes, what are the troubleshooting steps

  1. kubectl describe pod
  2. And the event logger

Diagnostic Steps

  1. See pod logs
  2. oc describe pod dg-cluster-nyc-0 > describe.pod
  3. And the event logger: oc get events -n > events.namespace
  4. See affinity set on Infinispan's CR:
affinity:
    podAntiAffinity:
  1. See OCP node utilization (via Prometheus) or standard tools:
oc adm top nodes
NAME                                       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
ci-ln-master-0         829m         23%       5511Mi          39%       
ci-ln-master-1         777m         22%       5515Mi          39%  
ci-ln-worker-a-abc         777m         22%       5515Mi          60%  

For OCP node crashing see:

  1. Look at MCD/kubelet logging to see what happened before the crash;
Product(s)
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.