Features, limitations, and known issues with DG Operator 8.3.x

Solution Verified - Updated

Environment

  • Red Hat Data Grid (RHDG)
    • 8.x
    • Dg Operator 8.3.x vs 8.2.x

Issue

What are the known issues with DG Operator 8.3.x?

Resolution

As the main recommendation do not install an old version of the DG Operator, install the latest, the following issues were diagnosed on DG Operator 8.3.x:

DG Operator 8.3.6:

IssueCommentActionVersionJira
Xsite migrationThere are known issues with xsite migrationFor those cases start a new namespace with the latest DG Operator version8.3.xContent from github.com is not included.Issue 1558; Content from github.com is not included.Issue 1554
Upgrade will be required when installing a DG OperatorHarmless issueAccept the upgrade for the same version (or accept via oc edit ip $ipname - example see script_approve_ip)8.3.x-
Cache doesn’t attach and doesn’t get readyThe caches objects are created even if the cache has issuesVerify the DG Operator logs for the cache creation and if the template cache is wrong;fix the template8.x-
Upgrade didn’t appear on DG op 8.3.5The upgrade might not be available to DG Op 8.3.6Refresh the console; might need to re-install it8.3.5-
Cache created via CM (briefly) disappearsWhen a cache is set via CM it briefly disappears on the upgradeNo action; the cache will re-appear8.3.6-
Console stuck with a certain number of podsThe console gets stuck in a certain number of podsUpgrade8.3.x (affected) - 8.3.7 (fixed)This content is not included.JDG-5405
Subsequent Cache CR reconciliations fail with Cache ServiceThe cache service keeps loopingAvoid deprecated spec.service.type: Cache8.3.x (affected) - 8.3.7 (to be fixed)This content is not included.JDG-5427
DG Operator 8.2.x does not have UFC protocolCan be an issue on Xsite asyncUpgrade to DG Operator 8.3.x-
Batch creation failstemplate creation fails when apply oc applytemplate == oc process; yaml declaration == oc applyall-
Cache creation with memory max-size="1GB" and global stateInfinispan issue: configuring memory max-size as non-byte value causes failure on CacheManager restartAdd the B at the end (binary max size)DG 8.3xThis content is not included.JDG-5442This content is not included.ISPN-13997
Xsite replication configuration doesn't work even though is rightall configuration is right but still doesn't workset pods to 0 and then increase again--
DG Operator 8.x does not have gc logthere is not gc logs set by default on the imageset via extraJvmOpts flags: extraJvmOpts: '-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1mDg 8.xThis content is not included.JDG-5461
Dg Operator 8.3.x does not have server.log in /server/log dirDG Operator 8.2.x had server/log/server.logto set it: provide a custom log4j.xml in a ConfigMap specified via spec.configMapName--
DG Operator restarts{"error": "leader election lost"}increase resources for Operator podAffected DG 8.3.x-
DG Xsite GR pod crashes and xsite is affectedafter GR pod go down Xsite will stop communication - issue not consistentIncrease limit-range of GR pod to avoid restartsAffected DG 8.3.xThis content is not included.JGRP-2634 and Content from github.com is not included.issues/1665 - to be address on DG 8.4.x
IllegalArgumentException: No enum constant org.infinispan.security.AuthorizationPermission.READ,WRITEThe Operator has the "admin" role, but this might not be all to workUse admin role on the cache definition(fixed) DG Op 8.3.7This content is not included.JDG-5458
Persistence definition using the tag <queries> breaks the config listenerHaving persistence defined with queries tag will break the configurationDefine the Cache CR {{spec.template}} in the DG consoleDG 8.3.xThis content is not included.ISPN-14226/This content is not included.JDG-5650

DG Operator 8.3.7:

IssueCommentActionVersionJira
Load Balancer not properly exposeWebconsole won't workUse NodePort or RouteOperator 8.3.7This content is not included.JDG-5527 -Loadbalancer not exposed correctly
This content is not included.Valid configuration throws NPEValid config will throw NPEWorkaround: disable config-listener podOperator 8.3.7/6This content is not included.JDG-5531 - YamlConfigurationReader throws NPE for valid cache configuration
CacheConfigurationException: java.security.KeyStoreException: ELY02035: KeyStore type could not be detectedcorrupt/invalid keystoreSee if keystore is not corruptedOperator 8.3.xN/A

DG Operator 8.3.8:

IssueCommentActionVersionJira
Upgrade to 8.3.8 failsupgrading from DG 8.3.7 to DG 8.3.8 gets stuckaction: skip upgradeDG op 8.3.7 to 8.3.8OLM upgrade issue
Default Anti-affinity strategy configuration with the Operator is not validDefault configuration will contain a r. before the kubernetes node labelno actionDG 8.3.8This content is not included.In investigation

For more issues, see solution Issues with Data Grid Operator 8.3.8.

DG Operator 8.3.x vs 8.2.x: comparative

IssueDG Operator versionFix/Description
server.logData Grid Operator 8.3.xserver.log is not in /server/log because file based logging was disabled by default by Content from github.com is not included.Operator 1363 - example template cluster-dg-01-log.txt
heap dump toolData Grid Operator 8.3.xthere is no jcmd on DG 8.3.x - alternatives: Generating JFR and Alternatives for creating heap dump in a DG 8 even without the JDK. Reason: that's because ubi:openjdk-11-runtimes doesn't have it.
thread dumpsData Grid Operator 8.3.xthere is no jcmd on DG 8.3.x - alternatives: kill -3
gc log absenceDG 8.3.xThis content is not included.Bug - Server Image overwrites JVM defaults from server distribution
UFC absenceData Grid Operator 8.2.xuse DG Operator 8.3.x.
ConfigMap doesn't work on DG op 8.2.xData Grid Operator 8.2.xuse DG Operator 8.3.x (feature added on DG op 8.3.x
Permissions change on - additional ServiceAccount permissions were added for config listenerDG Operator 8.3.xN/A

Using a different version of DG 8 and DG Operator

By setting the spec.image in the Custom Resource one can overwrite the default version of DG 8 - which fetches DG 8 image directly. However, this is unsupported though, given the DG Operator will bring a specific version already.
But this can be useful for instance in case of adding jcmd on the default image for instance - where the DG 8 image still the default.

Cache Creation [warning]

At this point, there is no warning on wrong configuration of caches, so the caches will be created and the status - will be there and the (wrongful configured) cache will not turn "Ready". Therefore verify the cache template creation, the namespace, the Infinispan's cluster name and the configuration set there:
Also when creating caches and batches, make sure to process templates and apply yamls.

Invalidation cache will fail

Although in DG CLI the cache creation succeeds, in the Operator (via cache yaml) Invalidation cache creation will fail, since there is no valid use-case for invalidation-cache for server inside OCP environment. See more details on Data Grid Invalidation cache on server/client mode. Below cache creation will fail and it is expected to fail:

  spec:
    clusterName: ${CLUSTER_NAME}
    name:  operator-cache-98
    template: |
      invalidationCache:
        locking:
          acquireTimeout: "15000"
          concurrencyLevel: "1000"
          striping: "false"

Cross-site change

As explained on Gossip Router pod in DG 8 OCP 4, DG Operator 8.3.x uses GossipRouter pod, where it needs only one Gossip Router pod to nominally operate and if one of the sites crashes, it uses the other site GossipRouter via tunnel protocol - following the diagram shows Tunnel protocol usage:

custer1Node -> cluster1Master -> GossipRouter -> cluster2Master -> cluster2OtherNode
which is the same as:
cluster1Node -> cluster1Relay1 -> GossipRouter -> cluster1Relay2 -> cluster2OtherNode

Memory requirement changes:

There was an increased the default Memory requirements from 512Mi -> 1Gi as the old value only provided a very small amount of memory that could be utilized as a cache when the overhead of the server and JVM was taken into account. See Content from github.com is not included.Operator 1386

Cache not ready
Cache not ready

Deprecated type Cache

As explained on Deprecated service type Cache in DG 8 in OCP 4, the spec.service.type.Cache is deprecated and should be avoided, use instead type DataGrid. However, it is the default (spec.service.type default is Cache). This is to maintain backwards compatibility on upgrades from other versions. Also note the Cache type will use as GC collector SerialGC, same as the Gossip Router pod.

Console Web

In order for the console to be accessible one need to explicitly expose the svc via a Route or LoadBalancer, regardless of what service type is used - nodePort leads to continuous loading:

    expose:
      type: LoadBalancer

Diagnostic Steps

For diagnosing issues:

ScenarioSolution
For OCP node crash - and its impactDG 8 operation in case of OCP nodes crashing
For pod crashes/oomeTroubleshoot options for Data Grid pod crash
For JFR investigations on DG 8.3Generating and analyzing JFR in Data Grid 8 in OCP
About jcmd usageAlternatives for creating heap dump in a DG 8 even without the JDK
Using custom DG configurationUsing custom configuration in DG 8 via Operator
Project cannot be deleted/cache cr still hangHow to delete all DG 8's objects in OCP 4?
How to interpret DG statisticsInterpreting Data Access statistics in DG 8
How to describe CR fieldsDG 8 Operator explain or describe fields
Migrate/Upgrade to DG 8.3How to migrate from DG 8.1/8.2 to DG 8.3 Operator
Config Listener pod in DG 8 OCP 4Config Listener pod in DG 8 OCP 4
Multiple versions of DG OperatorData Grid 8 Operators coexistence in a same Openshift cluster
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.