Interpreting Data Access statistics in DG 8

Solution Verified - Updated

Environment

  • Red Hat Data Grid (RHDG)
    • 8.x

Issue

How to interpret Data Access statistics in DG 8?

Resolution

The circle graphs give an idea of the relative number of operations. So then one can it see if has a lot of puts and no gets or something. But there is no hidden meaning on it.

The accurate number of cache entries is disabled on DG 8.3.1+ and it will show -1, and an approximately value is shown instead. Set <metrics accurate-size="true"/> below the cache container tag. This is true for CLI and Console. Note that enabling accurate-size="true" can cause performance degradation when multiple caches exist with many entries exist as reported in This content is not included.JDG-4270. It happens because of the cost of internally calling This content is not included.Cache#size() API internally, and This content is not included.the API doc also states the following caveat:

This method should only be used for debugging purposes such as to verify that the cache contains all the keys entered. Any other use involving execution of this method on a production system is not recommended.

To display on the Console:
Going on cache > Metrics> see each statistic metrics for the cache:

Data statistics access

The accurate values will be disabled, meaning it will show -1, unless <metrics accurate-size="true"/> is set on infinispan.xml.

Cache access tagPurposeComment
hitnumber of hits on the cache-1 means it's not tracking it, i.e. 1 means disabled, meaning the metric is exposed by not computed
missesnumber of misses on the cache, e.g DG cli comand remove key, where key is absent
storesnumber of caches on store
retrievalsnumber of retrievalscache store reads
remove hitsnumber of removalshit that expiration removes the cache entries , e.g DG cli comand remove key
remove missesnumber of missesmiss that expiration removes the cache entries, e.g RHSSO removing
evictionsnumber of entries evicted (not expired, evicted)
Cache container statistics
Cache container statistics

Going on global statistic on the corner left on the DG console:

Cluster-wide statistics

Cache entry tagPurpose
Number of entriesnumber of cache entries
Current number of entries in memorynumber of current entries in memory
Total number of entriestotal usage memory and storage
Data memory usedactual data memory used
Off-heap memory usedoff heap usage

Root Cause

One can use ./cli.sh to display statistics via stats command, see above.

Graph usage

The graph usage can help for quick view of operations and entries, however they only show operations in general.

OCP Console

In OCP, the This content is not included.Kubelet/cAdvisory shows details from files and displayed on the console but via oc adm top pod, by extracting memory-related metrics and feeds to crio/kubelet to appear on the console and on oc adm top pods, which is given by cgroups (and might differ from Prometheus values) as below:

Metriccalculation
container_memory_usage_bytesvalue in /sys/fs/cgroup/memory/memory.usage_in_bytes file. (Usage of the memory)
container_memory_working_set_bytescontainer_memory_usage_bytes - total_inactive_file (from /sys/fs/cgroup/memory/memory.stat)
container_memory_rsstotal_rss value from /sys/fs/cgroup/memory/memory.stat

However, for pod usage, see the htop/top/rss command inside pod for actual usage:

$oc get pod
$oc  rsh dg-cluster-nyc-0
$ top
top - 21:23:23 up  2:12,  0 users,  load average: 1.78, 0.92, 0.61
Tasks:   6 total,   1 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.5 us,  1.3 sy,  0.0 ni, 95.6 id,  0.0 wa,  0.3 hi,  0.3 si,  0.0 st
MiB Mem :  16021.6 total,   6295.2 free,   3669.4 used,   6057.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  11954.7 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                     
      1 1000650+  20   0   19220   3360   2932 S   0.0   0.0   0:00.03 server.sh                                                                                                                                   
    160 1000650+  20   0 2495516 431872  29780 S   0.0   2.6   0:32.19 java                                                                                                                                        
    376 1000650+  20   0   19220   3680   3160 S   0.0   0.0   0:00.00 sh       

Eviction vs Expiration

The commands of cache.remove(...) or expiration remove it from the stores.
However, eviction doesn't remove from persistence store, and therefore are recorded on statistics as "eviction", not removal.

Operations

Connecting via $RHDG/bin/cli.sh, one can do some operations like put, get, remove, clear cache, and those operations are registered on the statistics above, example:

[user@cluster//containers/default/caches/ccpa]> $ put banana1 1
[user@cluster//containers/default/caches/ccpa]> $ ls
banana1
apple2
[user@cluster//containers/default/caches/ccpa]> $ remove banana1
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> remove banana2
Not Found: 
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> clearcache 
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> ls
CommandStatistics result
putif(successful) stores++/entries++
getif(successful) removeHits++; else misses++;
lslist the entries inside cache
removeif(successful) removeHits++; else misses++;
clearcacheif(successful) removeHits++; else misses++;

Note that remove operation, removes it completely: in-memory and stores, like expiration operation.

Cli stats command:

DG 8.3.x won't display an accurate number of entries, unless set to do so. So the number of cache entries, given that statistics = true. Also when accurate metrics is not set, it should not be showing DB entries in the stats (in case persistence is set).

StatisticsPurpose
approximateEntriesnumber of entries in the cache
approximateInMemoryEntriesnumber of entries in memory
approximateUniqueEntriesnumber of unique entries in a clustered cache
total_number_of_entriesnumber of cache.put calls Not number of entries
evictionnumber of evicted caches

Example:

{​​
"time_since_start" : 3421,
"time_since_reset" : 3421,
"approximate_entries" : 1000,
"approximate_entries_in_memory" : 1000, <--------------------------------------------
"approximate_entries_unique" : 502,
"current_number_of_entries" : -1,
"current_number_of_entries_in_memory" : -1,
"total_number_of_entries" : 56429, <-------------------------------------------------
"off_heap_memory_used" : 0,
"data_memory_used" : -1,
"stores" : 56429, <------------------------------------------------------------------
"retrievals" : 346528,
"hits" : 292106,
"misses" : 54422,
"remove_hits" : 713,
"remove_misses" : 268,
"evictions" : 54448, <--------------------------------------------------------------- evictions
"average_read_time" : 0,
"average_read_time_nanos" : 6188,
"average_write_time" : 0,
"average_write_time_nanos" : 493023,
"average_remove_time" : 0,
"average_remove_time_nanos" : 493023,
"required_minimum_number_of_nodes" : 1
}​​

Where:
approximateEntries and approximateInMemoryEntries make sense both in the local stats and in the clustered stats (possibly counting each entry numOwners times). approximateUniqueEntries only makes sense in the clustered stats.

Issues

This content is not included.Check if MetricsRegistry is enabled before trying to register metrics: A NullPointerException is caused by MetricsRegistry not being enabled because not MeterRegistry is in the classpath. Upgrade to DG 8.4.7 to prevent that from happening.

Diagnostic Steps

Statistics:
$//containers/default/caches/cache]> stats

Product(s)
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.