Interpreting Data Access statistics in DG 8

Solution Verified - Updated 13 Jun 2024

Environment

Red Hat Data Grid (RHDG)
- 8.x

Issue

How to interpret Data Access statistics in DG 8?

Resolution

The circle graphs give an idea of the relative number of operations. So then one can it see if has a lot of puts and no gets or something. But there is no hidden meaning on it.

The accurate number of cache entries is disabled on DG 8.3.1+ and it will show -1, and an approximately value is shown instead. Set <metrics accurate-size="true"/> below the cache container tag. This is true for CLI and Console. Note that enabling accurate-size="true" can cause performance degradation when multiple caches exist with many entries exist as reported in This content is not included.JDG-4270. It happens because of the cost of internally calling This content is not included.Cache#size() API internally, and This content is not included.the API doc also states the following caveat:

This method should only be used for debugging purposes such as to verify that the cache contains all the keys entered. Any other use involving execution of this method on a production system is not recommended.

To display on the Console:
Going on cache > Metrics> see each statistic metrics for the cache:

Data statistics access

The accurate values will be disabled, meaning it will show -1, unless <metrics accurate-size="true"/> is set on infinispan.xml.

Cache access tag	Purpose	Comment
hit	number of hits on the cache	-1 means it's not tracking it, i.e. 1 means disabled, meaning the metric is exposed by not computed
misses	number of misses on the cache, e.g DG cli comand remove key, where key is absent
stores	number of caches on store
retrievals	number of retrievals	cache store reads
remove hits	number of removals	hit that expiration removes the cache entries , e.g DG cli comand remove key
remove misses	number of misses	miss that expiration removes the cache entries, e.g RHSSO removing
evictions	number of entries evicted (not expired, evicted)

Going on global statistic on the corner left on the DG console:

Cluster-wide statistics

Cache entry tag	Purpose
Number of entries	number of cache entries
Current number of entries in memory	number of current entries in memory
Total number of entries	total usage memory and storage
Data memory used	actual data memory used
Off-heap memory used	off heap usage

Root Cause

One can use ./cli.sh to display statistics via stats command, see above.

Graph usage

The graph usage can help for quick view of operations and entries, however they only show operations in general.

OCP Console

In OCP, the This content is not included.Kubelet/cAdvisory shows details from files and displayed on the console but via oc adm top pod, by extracting memory-related metrics and feeds to crio/kubelet to appear on the console and on oc adm top pods, which is given by cgroups (and might differ from Prometheus values) as below:

Metric	calculation
container_memory_usage_bytes	value in /sys/fs/cgroup/memory/memory.usage_in_bytes file. (Usage of the memory)
container_memory_working_set_bytes	container_memory_usage_bytes - total_inactive_file (from /sys/fs/cgroup/memory/memory.stat)
container_memory_rss	total_rss value from /sys/fs/cgroup/memory/memory.stat

However, for pod usage, see the htop/top/rss command inside pod for actual usage:

$oc get pod
$oc  rsh dg-cluster-nyc-0
$ top
top - 21:23:23 up  2:12,  0 users,  load average: 1.78, 0.92, 0.61
Tasks:   6 total,   1 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.5 us,  1.3 sy,  0.0 ni, 95.6 id,  0.0 wa,  0.3 hi,  0.3 si,  0.0 st
MiB Mem :  16021.6 total,   6295.2 free,   3669.4 used,   6057.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  11954.7 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                     
      1 1000650+  20   0   19220   3360   2932 S   0.0   0.0   0:00.03 server.sh                                                                                                                                   
    160 1000650+  20   0 2495516 431872  29780 S   0.0   2.6   0:32.19 java                                                                                                                                        
    376 1000650+  20   0   19220   3680   3160 S   0.0   0.0   0:00.00 sh

Eviction vs Expiration

The commands of cache.remove(...) or expiration remove it from the stores.
However, eviction doesn't remove from persistence store, and therefore are recorded on statistics as "eviction", not removal.

Operations

Connecting via $RHDG/bin/cli.sh, one can do some operations like put, get, remove, clear cache, and those operations are registered on the statistics above, example:

[user@cluster//containers/default/caches/ccpa]> $ put banana1 1
[user@cluster//containers/default/caches/ccpa]> $ ls
banana1
apple2
[user@cluster//containers/default/caches/ccpa]> $ remove banana1
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> remove banana2
Not Found: 
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> clearcache 
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> ls

Command	Statistics result
put	if(successful) stores++/entries++
get	if(successful) removeHits++; else misses++;
ls	list the entries inside cache
remove	if(successful) removeHits++; else misses++;
clearcache	if(successful) removeHits++; else misses++;

Note that remove operation, removes it completely: in-memory and stores, like expiration operation.

Cli stats command:

DG 8.3.x won't display an accurate number of entries, unless set to do so. So the number of cache entries, given that statistics = true. Also when accurate metrics is not set, it should not be showing DB entries in the stats (in case persistence is set).

Statistics	Purpose
approximateEntries	number of entries in the cache
approximateInMemoryEntries	number of entries in memory
approximateUniqueEntries	number of unique entries in a clustered cache
total_number_of_entries	number of cache.put calls Not number of entries
eviction	number of evicted caches

Example:

{
"time_since_start" : 3421,
"time_since_reset" : 3421,
"approximate_entries" : 1000,
"approximate_entries_in_memory" : 1000, <--------------------------------------------
"approximate_entries_unique" : 502,
"current_number_of_entries" : -1,
"current_number_of_entries_in_memory" : -1,
"total_number_of_entries" : 56429, <-------------------------------------------------
"off_heap_memory_used" : 0,
"data_memory_used" : -1,
"stores" : 56429, <------------------------------------------------------------------
"retrievals" : 346528,
"hits" : 292106,
"misses" : 54422,
"remove_hits" : 713,
"remove_misses" : 268,
"evictions" : 54448, <--------------------------------------------------------------- evictions
"average_read_time" : 0,
"average_read_time_nanos" : 6188,
"average_write_time" : 0,
"average_write_time_nanos" : 493023,
"average_remove_time" : 0,
"average_remove_time_nanos" : 493023,
"required_minimum_number_of_nodes" : 1
}

Where:
approximateEntries and approximateInMemoryEntries make sense both in the local stats and in the clustered stats (possibly counting each entry numOwners times). approximateUniqueEntries only makes sense in the clustered stats.

Issues

This content is not included.Check if MetricsRegistry is enabled before trying to register metrics: A NullPointerException is caused by MetricsRegistry not being enabled because not MeterRegistry is in the classpath. Upgrade to DG 8.4.7 to prevent that from happening.

Diagnostic Steps

Statistics:
$//containers/default/caches/cache]> stats

SBR

Product(s)

Red Hat Data Grid

Components

infinispan

Category

Learn more

Tags

clustering

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.