Data Grid 8 Operator Exposition Route vs NodePort vs LoadBalancer in Openshift 4

Solution Verified - Updated

Environment

  • Red hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

What is the difference between exposing Data Grid via Route vs NodePort vs LoadBalancer in Openshift?
Which type of Route does the DG Operator create automatically?

Resolution

Exposition MethodPurpose/DetailsExternal Service
Load BalancerNo Haproxy - can use a custom urlCreates an external service (service type Loadbalancer)
NodePortNodePort services use a port in the range of 30000-32767Creates an external service (service type NodePort)
RoutePasses through the Ingress's Haproxy (boundaries are Haproxy rules, like 8080, 80,443 ports) - Routes use either port 80 (unencrypted) or 443 (encrypted)Haproxy Route pointing to the internal service i.e. not new service type, instead uses the internal already defined and user will have basic intelligence

NodePort

NodePort service is the simplest exposition form. Has similar limitations as the ones below.

$ oc get svc
NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
example-infinispan               ClusterIP   127.1.1.1   <none>        11222/TCP         105m
example-infinispan-admin         ClusterIP   None             <none>        11223/TCP         105m
example-infinispan-external      NodePort    127.1.1.2  <none>        11222:30234/TCP   90m

The service example-infinispan-external type NodePort is the external access to the DG.
And in this case nodeport is explicitly opening port 32074 on every node. Meaning the 32074 is the nodeport, which is exposed on all running available nodes in the cluster, so the IP will be any of the IPs of the nodes
The 11222 is the service port for the 'clusterIP' of the service.
In case not set, the port is randomly selected from 30000->32767 when the service is converted to nodeport type. And even deleting the service, the Infinispan's CR will re-create another one with the same specs, meaning same port.

LoadBalancer

LoadBalancer is a type of service with spec.type: LoadBalancer, so it should list as service:

$ oc get svc
NAME                     TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)           AGE
dg-cluster-lb            ClusterIP      127.0.0.1   <none>                                                                   11222/TCP         30m
dg-cluster-lb-admin      ClusterIP      127.0.0.1    <none>                                                                   11223/TCP         30m
dg-cluster-lb-external   LoadBalancer   127.0.0.1     example.amazonaws.com   11222:30436/TCP   30m
dg-cluster-lb-ping       ClusterIP      None             <none>                                                                   8888/TCP          30m

Type LoadBalancer does allow session affinity via ClientIP, see This page is not included, but the link has been rewritten to point to the nearest parent document.OCP 4.12 - Service - sessionAffinity.
Consequently, Load Balancer will have one more service than type Route, but won't have routes.
It's common that if you're running Openshift on baremetal that you don't have a LoadBalancer implementation available, if that's the case - so then the following warning will appear:

LoadBalancer expose type is not supported on the target platform

This is because the service type Load Balancer will point to an external load balancer that is not in the OCP layer and relies on the target platform to be spawned. Therefore, if the platform doesn't have it, or doesn't allow its usage, it will return the outcome above.

Route

The route pushes the requests to the internal DG service (automatically created) which then pushes to the DG server, and can be secure and non-secure (usually DG will create a secure route with passthrough encryption)

Route -> internal -> DG server

Meaning the route declaration has a rule forwarding (on route.yaml) - the rule basically takes from the host and sends to a service with name and target port: 11222:

  spec:
    host: somewhere.somewhere <---- host
    port:
      targetPort: 11222
    to:
      kind: Service
      name: example-svc <------------------ service
      weight: 100

And the service attached to it is an internal service (svc) meaning it is a ClientIP - it won't have externalIP and this is expected.

      kind: Infinispan
      name: example-svc    <----------- example-svc service
  spec:
    clusterIP: 127.0.0.1 <----------- 127.0.0.1 
    clusterIPs:
    - 127.0.0.1
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: infinispan
      port: 11222
      protocol: TCP
      targetPort: 11222
    selector:
      app: infinispan-pod
      clusterName: example-app-cluster
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}

Exposing type Route will have 3 services + route (so no external service is created):

$ oc get route
NAME                        HOST/PORT                                                                                 PATH   SERVICES           PORT    TERMINATION   WILDCARD
dg-cluster-route-external   example.openshift.org          dg-cluster-route   11222                 None
$ oc get svc
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
dg-cluster-route         ClusterIP   127.0.0.1   <none>        11222/TCP   41m
dg-cluster-route-admin   ClusterIP   127.0.0.1   <none>        11223/TCP   41m
dg-cluster-route-ping    ClusterIP   None            <none>        8888/TCP    41m

While service has endpoints (pod ips in case of headless), there are no endpoints for the route. It is just a means to accept traffic on port 80/443 on the router pods. Also the service could be fine with haproxy having some issues
All OCP routes that go through ingress controller have a default wildcard FQDN: *.apps. This can be changed on the default ingress controller or via another ingress controller This page is not included, but the link has been rewritten to point to the nearest parent document.plus an custom certificate.
For wildcards, see the problem with Wildcards+http2 for http coalescing in Different routes send traffic to wrong application.

DG Operator has a fail-back mechanism to ingress:*
DG Operator will fault back to the ingress not just in kubernetes, but also in OCP 4 in case the Route cannot be created. However, the route will be created regardless, because the moment the Ingress is created, Ingress controller will spawn a Route, which is bounded, and the deletion of Ingress will remote the Route.
The user cannot use Ingress in OCP 4 regardless, because the Ingress in OCP does not have any function, only for ingress controller to create a route - just pro forma as a blueprint for the Route.
So it is not just a matter of should not but rather cannot, OCP 4 == Route usage - always.

TLS vs no TLS

Using the spec field spec.security.endpointEncryption: certSecretName, the user can set a encryption cert - which will be added on the service/route via annotation (service.beta.openshift.io/serving-cert-secret-name: $infinispan_cr-cert-secret).

No TLS (not secure route):

  security:
    endpointAuthentication: true
    endpointEncryption:
      clientCert: None <---------------
      type: None       <---------------
    endpointSecretName: $infinispan_cr-dev-generated-secret
$ oc get route
NAME                        HOST/PORT                                                                                 PATH   SERVICES           PORT    TERMINATION   WILDCARD
dg-cluster-route-external   example.openshift.org          dg-cluster-route   11222                 None

Above is an (unsecure) route: dg-cluster-route-external meaning without TLS termination.

With TLS (secure route):

spec:
  security:
    endpointAuthentication: true
    endpointEncryption:
      certSecretName: example-infinispan-cert-secret <----------- will be  service.beta.openshift.io/serving-cert-secret-name annotation

And then the route:

$ oc get route
NAME                        HOST/PORT                                                                                 PATH   SERVICES           PORT    TERMINATION   WILDCARD
dg-cluster-route-external   example.openshift.org          dg-cluster-route   11222   passthrough   None

Above is a (secure) route: dg-cluster-route-external with passthrough TLS termination. Data Grid Operator will use passthrough encryption (with no wildcard) so then the pod handles the encryption.

Generically speaking, from OCP shift network, there are three types of secure routes - DG Operator only provides pass-through:

Type of secure routeWho handles itExplanations
passthroughwhich the pod handlesthe POD set the TLS and validates it
edgehaproxy handledTLS termination occurs at the router, before the traffic gets routed to the pods - connections from the router to the endpoints over the internal network are not encrypted.
re-encryptthe pod/haproxy both handle itBoth the pod and the haproxy handles it

Route + Hot Rod client

Only BASIC client intelligence is supported when connecting via a Route as it's not possible for the Hot Rod client to access the individual Infinispan pods in order to connect to them directly. Consequence when the encryption is the Hot Rod client won't be able to connect. In other words, Hot Rod Connections via a Route are only possible via TLS with SNI, so if encryption is disabled this will not work.
The TLS will be on the route as passthrough termination:

kind: Infinispan
...
spec:
...
  expose:
    type: Route
  service:
    type: DataGrid
  replicas: 1

This will create a route with TLS:

$ oc get route
NAME                        HOST/PORT                                                                                 PATH   SERVICES           PORT    TERMINATION   WILDCARD
dg-cluster-route-external   example.openshift.org          dg-cluster-route   11222   passthrough   None

Ports:

PortPurpose
11222default connector port - for hotrod clients
11223default administration port

Root Cause

User can access DG via a service if the client application is deployed in
the same cluster (same namespace or not), but not if it's external to the cluster.

Application locationMethod of access
Application external to do the clusterUse the external (via spec.expose)
Application inside the clusterUse the clusterIP service already provided

Web Console

By exposing of the types above, it is possible to access the web console. That's required because the internal service only accessible from within the Openshift cluster, so the browser is unable to connect to it.
And the default created credentials should work for accessing the browser.

Cert

The certificates can be set on the services via annotations (that's done automatically by the operator):

$ oc describe svc example-cluster-service
Name:              example-cluster-service
Namespace:         test-dg-1
Labels:            app=infinispan-service
                   clusterName=example-cluster
                   infinispan_cr=example-cluster
Annotations:       service.beta.openshift.io/serving-cert-secret-name: example-cluster-cert-secret <---------------
Selector:          app=infinispan-pod,clusterName=example-cluster
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                127.0.0.1
IPs:               127.0.0.1
Port:              infinispan  11222/TCP
TargetPort:        11222/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

Internal Service

The internal service created by the DG operator deployment is a headed service, which provides basic loadbalancing capabilities.
Similar to a headless service, it provides one IP at time, as explained here and here.
The only difference is that headless services provides multiple Address Records (A record) given the usage of dig/nslookup/etc - not curl).

Internal Service usage with FQDN vs PQDN

The fully qualified domain name (FQDN) of the service would be:

{{cr_name}}.{{namespace}}.svc.cluster.local

The command $ oc get svc, will not return the FQDN, only the name/clusterip/externalip/ports.
Usually FQDN is preferred instead of PQDN, because if there is another similar service exposed by the same data grid operator that can result in a non resolution by the DNS. Meaning FQDN > PQDN in terms of usability (as in more adequate).

Client intelligence

Distribution aware is the default, which is the recommended however, for Route for example of because of a firewall/load balancer blocking direct access for example, the distribution aware cannot be used. So then basic clietn intelligence needs to be used.
On the hotrod connections should be evenly distributed among the nodes based on the hash (keys) but also on the number of owners.
When clients make a call they will determine who owns that entry, and if an existing idle connection to that node is not available then create a new one (up to the configured limits). Since key hashes should generally be fairly evenly spread among nodes, the connections each client has should also.
It can happen that, even on hashware client setting, a particular node getting more connections, so then the keys can explain. For example if most of client requests are to a single key "foo" then most of the connections would be to "foo"'s owner.
During issues, for example if the node is having problems and not responding, then requests to that node take longer. So if the same number of requests to go each node, that one will have a higher average number of active requests, requiring more connections.
Low socket timeouts and node with network issue, can have as consequence significantly more connections).

Diagnostic Steps

  1. Route with Hot Rod:
R:somewhere.somewhere/1.127.1.1:80 example-svc ClusterIP 127.1.111.69  <none>          11222/TCP 9d <-----------

So the process above is clearly - the 80 gave the answer:

Route (somewhere.somewhere/1.127.1.1:80)-> ClusterIP (example-svc - 127.1.111.69) -> DG:11222
  1. Get the route (which must be TLS/SNI) for Hot Rod via Route - see passthrough:
$ oc get route
NAME                        HOST/PORT                                                                                 PATH   SERVICES           PORT    TERMINATION   WILDCARD
dg-cluster-route-external   example.openshift.org          dg-cluster-route   11222   passthrough   None
  1. Get the cert attached to the service:
    Doesn't have cert:
$ oc describe svc example-cluster | grep Annotations
Annotations:       <none> <-------------------------------- no cert

Has cert:

$ oc describe svc example-cluster | grep Annotations
Annotations:       service.beta.openshift.io/serving-cert-secret-name: example-cluster-cert-secret

Type loadbalancer warning:

Warning  LoadBalancerUnsupported  16s (x60 over 10m)  controller-infinispan  LoadBalancer expose type is not supported on the target platform
...
LoadBalancer expose type is not supported on the target platform

This is common if running Openshift on baremetal that you do not have a LoadBalancer implementation available, if that's the case - so then the warning above will appear.

Client intelligence

By default it will use distribution aware:

12:50:20,174 TRACE (non-blocking-thread--p2-t2) [org.infinispan.server.hotrod.BaseDecoder] Parsed header: HotRodHeader{op=PING, version=29, messageId=45605, cacheName='', flag=0, clientIntel=3, topologyId=-1, keyType=application/unknown, valueType=application/unknown}

The clientIntel=3 above means it is using the default INTELLIGENCE_HASH_DISTRIBUTION_AWARE. Not basic, that would be 1, nor hash aware, that would be 2.

Product(s)
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.