Interpreting Data Access statistics in DG 8
Environment
- Red Hat Data Grid (RHDG)
- 8.x
Issue
How to interpret Data Access statistics in DG 8?
Resolution
The circle graphs give an idea of the relative number of operations. So then one can it see if has a lot of puts and no gets or something. But there is no hidden meaning on it.
The accurate number of cache entries is disabled on DG 8.3.1+ and it will show -1, and an approximately value is shown instead. Set <metrics accurate-size="true"/> below the cache container tag. This is true for CLI and Console. Note that enabling accurate-size="true" can cause performance degradation when multiple caches exist with many entries exist as reported in This content is not included.JDG-4270. It happens because of the cost of internally calling This content is not included.Cache#size() API internally, and This content is not included.the API doc also states the following caveat:
This method should only be used for debugging purposes such as to verify that the cache contains all the keys entered. Any other use involving execution of this method on a production system is not recommended.
To display on the Console:
Going on cache > Metrics> see each statistic metrics for the cache:
Data statistics access
The accurate values will be disabled, meaning it will show -1, unless <metrics accurate-size="true"/> is set on infinispan.xml.
| Cache access tag | Purpose | Comment |
|---|---|---|
| hit | number of hits on the cache | -1 means it's not tracking it, i.e. 1 means disabled, meaning the metric is exposed by not computed |
| misses | number of misses on the cache, e.g DG cli comand remove key, where key is absent | |
| stores | number of caches on store | |
| retrievals | number of retrievals | cache store reads |
| remove hits | number of removals | hit that expiration removes the cache entries , e.g DG cli comand remove key |
| remove misses | number of misses | miss that expiration removes the cache entries, e.g RHSSO removing |
| evictions | number of entries evicted (not expired, evicted) |
Going on global statistic on the corner left on the DG console:
Cluster-wide statistics
| Cache entry tag | Purpose |
|---|---|
| Number of entries | number of cache entries |
| Current number of entries in memory | number of current entries in memory |
| Total number of entries | total usage memory and storage |
| Data memory used | actual data memory used |
| Off-heap memory used | off heap usage |
Root Cause
One can use ./cli.sh to display statistics via stats command, see above.
Graph usage
The graph usage can help for quick view of operations and entries, however they only show operations in general.
OCP Console
In OCP, the This content is not included.Kubelet/cAdvisory shows details from files and displayed on the console but via oc adm top pod, by extracting memory-related metrics and feeds to crio/kubelet to appear on the console and on oc adm top pods, which is given by cgroups (and might differ from Prometheus values) as below:
| Metric | calculation |
|---|---|
| container_memory_usage_bytes | value in /sys/fs/cgroup/memory/memory.usage_in_bytes file. (Usage of the memory) |
| container_memory_working_set_bytes | container_memory_usage_bytes - total_inactive_file (from /sys/fs/cgroup/memory/memory.stat) |
| container_memory_rss | total_rss value from /sys/fs/cgroup/memory/memory.stat |
However, for pod usage, see the htop/top/rss command inside pod for actual usage:
$oc get pod
$oc rsh dg-cluster-nyc-0
$ top
top - 21:23:23 up 2:12, 0 users, load average: 1.78, 0.92, 0.61
Tasks: 6 total, 1 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.5 us, 1.3 sy, 0.0 ni, 95.6 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
MiB Mem : 16021.6 total, 6295.2 free, 3669.4 used, 6057.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 11954.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 1000650+ 20 0 19220 3360 2932 S 0.0 0.0 0:00.03 server.sh
160 1000650+ 20 0 2495516 431872 29780 S 0.0 2.6 0:32.19 java
376 1000650+ 20 0 19220 3680 3160 S 0.0 0.0 0:00.00 sh
Eviction vs Expiration
The commands of cache.remove(...) or expiration remove it from the stores.
However, eviction doesn't remove from persistence store, and therefore are recorded on statistics as "eviction", not removal.
Operations
Connecting via $RHDG/bin/cli.sh, one can do some operations like put, get, remove, clear cache, and those operations are registered on the statistics above, example:
[user@cluster//containers/default/caches/ccpa]> $ put banana1 1
[user@cluster//containers/default/caches/ccpa]> $ ls
banana1
apple2
[user@cluster//containers/default/caches/ccpa]> $ remove banana1
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> remove banana2
Not Found:
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> clearcache
[fdemeloj-3335@cluster//containers/default/caches/ccpa]> ls
| Command | Statistics result |
|---|---|
| put | if(successful) stores++/entries++ |
| get | if(successful) removeHits++; else misses++; |
| ls | list the entries inside cache |
| remove | if(successful) removeHits++; else misses++; |
| clearcache | if(successful) removeHits++; else misses++; |
Note that remove operation, removes it completely: in-memory and stores, like expiration operation.
Cli stats command:
DG 8.3.x won't display an accurate number of entries, unless set to do so. So the number of cache entries, given that statistics = true. Also when accurate metrics is not set, it should not be showing DB entries in the stats (in case persistence is set).
| Statistics | Purpose |
|---|---|
| approximateEntries | number of entries in the cache |
| approximateInMemoryEntries | number of entries in memory |
| approximateUniqueEntries | number of unique entries in a clustered cache |
| total_number_of_entries | number of cache.put calls Not number of entries |
| eviction | number of evicted caches |
Example:
{
"time_since_start" : 3421,
"time_since_reset" : 3421,
"approximate_entries" : 1000,
"approximate_entries_in_memory" : 1000, <--------------------------------------------
"approximate_entries_unique" : 502,
"current_number_of_entries" : -1,
"current_number_of_entries_in_memory" : -1,
"total_number_of_entries" : 56429, <-------------------------------------------------
"off_heap_memory_used" : 0,
"data_memory_used" : -1,
"stores" : 56429, <------------------------------------------------------------------
"retrievals" : 346528,
"hits" : 292106,
"misses" : 54422,
"remove_hits" : 713,
"remove_misses" : 268,
"evictions" : 54448, <--------------------------------------------------------------- evictions
"average_read_time" : 0,
"average_read_time_nanos" : 6188,
"average_write_time" : 0,
"average_write_time_nanos" : 493023,
"average_remove_time" : 0,
"average_remove_time_nanos" : 493023,
"required_minimum_number_of_nodes" : 1
}
Where:
approximateEntries and approximateInMemoryEntries make sense both in the local stats and in the clustered stats (possibly counting each entry numOwners times). approximateUniqueEntries only makes sense in the clustered stats.
Issues
This content is not included.Check if MetricsRegistry is enabled before trying to register metrics: A NullPointerException is caused by MetricsRegistry not being enabled because not MeterRegistry is in the classpath. Upgrade to DG 8.4.7 to prevent that from happening.
Diagnostic Steps
Statistics:
$//containers/default/caches/cache]> stats
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.