Java application periodic high latency / processing times due to NUMA page reclaim on RHEL

Solution Verified - Updated 7 Aug 2024

Environment

Red Hat Enterprise Linux 5.4
- kernel 2.6.18-164.11.1.el5.x86_64
CPU / memory
- 24 CPUs total, 6 cores
- 16 GB ram, 8 GB swap
- 2 Node NUMA system, with 8GB RAM on each NUMA node
Jboss (running in its own JVM), jbossas, jboss-messaging
- Jboss interfaces with Oracle via local TCP (port 1521)
Web application (running in its own JVM)
- JSF Based Web application (TCP / HTTP 1.1) using RichFaces & a4j components.
Oracle Version: 11gR1 11.1.0.7
- Running with AMM, which forbids the use of HugePages
Veritas VCS, VxVM, VxDMP

Issue

JBoss server periodically consuming high CPU and experiencing pauses.
- Periodic (1 out of 100) garbage collections take an excessive amount of system time.
Java based web application experiences periodic (approximately 5 times out of 100) slow application response times
- Application response is < 100ms 95% of the time; the other 5%, response may take up to 100 seconds.
- Unresponsiveness is seen across several processes (JBoss, Oracle, etc), and slowness appears to be system-wide.
Periodically, processes such as 'uname', 'grep', and 'perl', take an exceptional amount of time to execute, and all seem to be using an exceptional amount of system time.
Oracle responds to Jboss calls in less than 1s 90% of the time, but a few times Oracle takes 30-40s, and may exceed the 60s query timeout resulting in Oracle error ORA-01013.

Resolution

Adding vm.zone_reclaim_mode = 0 in /etc/sysctl.conf, and running "sysctl -a" disabled zone_reclaim.
- This resolved the periodic high system CPU in various processes, and application responses were much more predictible.

Root Cause

vm.zone_reclaim_mode was set to '1' because a 2-node NUMA was detected on boot.
Unfortunately, this led to processes going into page reclaim on the local NUMA node instead of accessing memory on the other node.
For a file-based workload such as a database, file server, or web-server, zone_reclaim_mode should be set to 0.

Diagnostic Steps

Run "numactl --hardware" and observe at least 2 nodes, with distance > 20, and one node with much lower memory than the other.

available: 2 nodes (0-1)
node 0 size: 8035 MB
node 0 free: 408 MB
node 1 size: 8080 MB
node 1 free: 3606 MB
node distances:
node    0    1
  0:   10   21
  1:   21   10

* Setup sysrq via https://access.redhat.com/kb/docs/DOC-2024

Run the following simple script, which will send 'sysrq-t' output to /var/log/messages, approximately every 5s:
```
# while (true); do sleep 5; echo 't' > /proc/sysrq-trigger ;done
```

Analyze the backtraces of 'D' state and 'R' state processes.

'R' state process analysis shows most running processes in a 'zone_reclaim() ... isolate_lru_pages()' backtrace, similar to:

Jul 22 03:43:37 linux-s1 kernel: monitor       R  running task       0 31787  31709                     (NOTLB)
Jul 22 03:43:37 linux-s1 kernel:  ffff81035fff1a58 0000000000000020 0000000000000020 0000000000000000
Jul 22 03:43:37 linux-s1 kernel:  0000000000000020 0000000000000000 0000000000000000 0000000000000020
Jul 22 03:43:37 linux-s1 kernel:  0000000000000000 0000000000000001 ffff8103c52d9a50 0000000000000020
Jul 22 03:43:37 linux-s1 kernel: Call Trace:
Jul 22 03:43:37 linux-s1 kernel:  [] isolate_lru_pages+0x98/0xbf
Jul 22 03:43:37 linux-s1 kernel:  [] __pagevec_release+0x19/0x22
Jul 22 03:43:37 linux-s1 kernel:  [] shrink_active_list+0x4b4/0x4c4
Jul 22 03:43:37 linux-s1 kernel:  [] shrink_zone+0xf7/0x15d
Jul 22 03:43:37 linux-s1 kernel:  [] zone_reclaim+0x1cc/0x292
Jul 22 03:43:37 linux-s1 kernel:  [] zone_reclaim+0x1cc/0x292
Jul 22 03:43:37 linux-s1 kernel:  [] get_page_from_freelist+0xbf/0x43a
Jul 22 03:43:37 linux-s1 kernel:  [] __alloc_pages+0x65/0x2ce
Jul 22 03:43:37 linux-s1 kernel:  [] do_wp_page+0x4b7/0x8dc
Jul 22 03:43:37 linux-s1 kernel:  [] filemap_nopage+0x193/0x360
Jul 22 03:43:37 linux-s1 kernel:  [] __handle_mm_fault+0xed4/0xf99
Jul 22 03:43:37 linux-s1 kernel:  [] math_state_restore+0x23/0x4c
Jul 22 03:43:37 linux-s1 kernel:  [] error_exit+0x0/0x84
Jul 22 03:43:37 linux-s1 kernel:  [] do_page_fault+0x4cb/0x830
Jul 22 03:43:37 linux-s1 kernel:  [] sys_rt_sigreturn+0x283/0x356
Jul 22 03:43:37 linux-s1 kernel:  [] sys_rt_sigreturn+0x323/0x356
Jul 22 03:43:37 linux-s1 kernel:  [] error_exit+0x0/0x84

Run zone_reclaim.stp: record and print freqency of processes calling zone_reclaim(); print any process in zone_reclaim() exceeding a specified threshold (1s by default).

Look for a high rate of processes calling zone_reclaim(), some (such as 'grep') calling zone_reclaim() thousands of times in a 5s period.

Product(s)

Red Hat Enterprise Linux

Components

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.