Java runs out of memory due to solr memory leak

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5

Issue

  • A java process (an opensource search engine called solr) has a memory leak

Root Cause

A java process (an opensource search engine called solr) had a memory leak

Diagnostic Steps

The console showed

Free pages:       22212kB (0kB HighMem)

Active:488168
inactive:450931 dirty:0 writeback:0 unstable:0 free:5553 slab:12894
mapped-file:943 mapped-anon:947213 pagetables:36668

Node 0 DMA
free:11156kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
present:10764kB pages_scanned:0 all_unreclaimable? yes

lowmem_reserve[]: 0 3254 4012 4012

Node
0 DMA32 free:9564kB min:6564kB low:8204kB high:9844kB active:1568724kB
inactive:1541480kB present:3332636kB pages_scanned:48593765
all_unreclaimable? yes

lowmem_reserve[]: 0 0 757 757

Node 0
Normal free:1492kB min:1528kB low:1908kB high:2292kB active:384112kB
inactive:262016kB present:775680kB pages_scanned:9809898
all_unreclaimable? yes

lowmem_reserve[]: 0 0 0 0

Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no

lowmem_reserve[]: 0 0 0 0

Node 0 DMA: 3*4kB 3*8kB 5*16kB 3*32kB 1*64kB 3*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11156kB

Node 0 DMA32: 1*4kB 7*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 6*512kB 0*1024kB 1*2048kB 1*4096kB = 9564kB

Node 0 Normal: 31*4kB 7*8kB 22*16kB 0*32kB 1*64kB 5*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1492kB

Node 0 HighMem: empty

1367 pagecache pages

Swap cache: add 15198615, delete 15198360, find 44373119/46598232, race 34+1604

Free swap  = 0kB

Total swap = 4194296kB

java invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff800c10ab>] out_of_memory+0x8e/0x2f5
 [<ffffffff8000f263>] __alloc_pages+0x245/0x2ce
 [<ffffffff80032353>] read_swap_cache_async+0x45/0xd8
 [<ffffffff800c697e>] swapin_readahead+0x60/0xd3
 [<ffffffff80008fd3>] __handle_mm_fault+0x983/0xe23
 [<ffffffff880337ac>] :jbd:journal_stop+0x1f3/0x1ff
 [<ffffffff8006686f>] do_page_fault+0x4cb/0x830
 [<ffffffff801434f7>] __next_cpu+0x19/0x28
 [<ffffffff800756b4>] smp_send_reschedule+0x4e/0x53
 [<ffffffff8005dde9>] error_exit+0x0/0x84
 [<ffffffff80061f9c>] __get_user_8+0x20/0x2c
 [<ffffffff800a17a0>] exit_robust_list+0x20/0x115
 [<ffffffff80015194>] do_exit+0x251/0x8f1
 [<ffffffff80048a89>] cpuset_exit+0x0/0x6c
 [<ffffffff8002b094>] get_signal_to_deliver+0x42c/0x45a
 [<ffffffff8005aae6>] do_notify_resume+0x9c/0x7a9
 [<ffffffff8008ad7d>] default_wake_function+0x0/0xe
 [<ffffffff8009df7c>] autoremove_wake_function+0x0/0x2e
 [<ffffffff800b45f9>] audit_syscall_exit+0x31b/0x336
 [<ffffffff8005d32e>] int_signal+0x12/0x17


Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
Node 0 DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:10
cpu 0 cold: high 62, batch 15 used:51
cpu 1 hot: high 186, batch 31 used:29
cpu 1 cold: high 62, batch 15 used:52
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:19
cpu 3 hot: high 186, batch 31 used:25
cpu 3 cold: high 62, batch 15 used:60
Node 0 Normal per-cpu:
cpu 0 hot: high 186, batch 31 used:134
cpu 0 cold: high 62, batch 15 used:54
cpu 1 hot: high 186, batch 31 used:20
cpu 1 cold: high 62, batch 15 used:52
cpu 2 hot: high 186, batch 31 used:29
cpu 2 cold: high 62, batch 15 used:36
cpu 3 hot: high 186, batch 31 used:16
cpu 3 cold: high 62, batch 15 used:44
Node 0 HighMem per-cpu: empty
Free pages:       22212kB (0kB HighMem)
Active:488233
inactive:450850 dirty:0 writeback:0 unstable:0 free:5553 slab:12894
mapped-file:943 mapped-anon:947213 pagetables:36668
Node 0 DMA
free:11156kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
present:10764kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3254 4012 4012
Node
0 DMA32 free:9564kB min:6564kB low:8204kB high:9844kB active:1569800kB
inactive:1541172kB present:3332636kB pages_scanned:50121790
all_unreclaimable? yes
lowmem_reserve[]: 0 0 757 757
Node 0
Normal free:1492kB min:1528kB low:1908kB high:2292kB active:383708kB
inactive:262036kB present:775680kB pages_scanned:10174634
all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 3*8kB 5*16kB 3*32kB 1*64kB 3*128kB 1*256kB 0*512kB 2*1024kB 0*2048kB 2*4096kB = 11156kB
Node 0 DMA32: 1*4kB 7*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 6*512kB 0*1024kB 1*2048kB 1*4096kB = 9564kB
Node 0 Normal: 31*4kB 7*8kB 22*16kB 0*32kB 1*64kB 5*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1492kB
Node 0 HighMem: empty
1367 pagecache pages
Swap cache: add 15198615, delete 15198360, find 44373119/46598232, race 34+1604
Free swap  = 0kB
Total swap = 4194296kB
httpd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

1. We felt that a java process had an accumulating memory need which gathered memory but did not free it, because there is no logging during the period of the hang. But the console messages that the customer has sent, clearly indicate as much.

This was likely to be a write request that was blocking all others (so no logs) coupled with mounting application mallocs. So memory gets allocated until it gets exhausted - without logs.

IWe suggest a kdump so that we could study this properly if it happens again.

2. The console messages point to an out-of-memory problem

java invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
[<ffffffff800c10ab>] out_of_memory+0x8e/0x2f5
[<ffffffff8000f263>] __alloc_pages+0x245/0x2ce
[<ffffffff80032353>] read_swap_cache_async+0x45/0xd8

[/snip]

3. There appear to be two java processes running on the server. From the ps

[foo@bar]$ grep -i java aaa.307274-232535/ps



tomcat 2430  0.6 14.9 1645940 604860 ?      Sl   Jun15  10:28

/usr/java/default/bin/java

-Djava.util.logging.config.file=/usr/local/path/apache-tomcat/conf/logging.properties

-XX:MaxPermSize=256m -Xms768m -Xmx768m -server -Xincgc -verbose:gc

-XX:+PrintGCDetails -Xminf0.1 -Xmaxf0.3

-Djava.library.path=/usr/local/path/apache-tomcat/lib

-Dcatalina.ext.dirs=/usr/local/path/apache-tomcat/shared/lib:/usr/local/path/apache-tomcat/common/lib

-Djava.awt.headless=true

-Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true

-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager

-Duser.timezone=PST8PDT

-Dcom.sun.management.config.file=/data01/path/apache-tomcat/conf/management.properties

-Djava.endorsed.dirs=/usr/local/path/apache-tomcat/common/endorsed

-classpath

:/usr/local/path/apache-tomcat/bin/bootstrap.jar:/usr/local/path/apache-tomcat/bin/commons-logging-api.jar

-Dcatalina.base=/usr/local/path/apache-tomcat

-Dcatalina.home=/usr/local/path/apache-tomcat

-Djava.io.tmpdir=/usr/local/path/apache-tomcat/temp

org.apache.catalina.startup.Bootstrap start

solr     32450  0.0  2.5 1409680 104088 ?      Sl   Jun15   0:04 java -jar start.jar

4. The customer has installed a 3rd-party java

[foo@bar]$ grep java aaa.307274-232535/installed-rpms 

sun-javadb-demo-10.4.1-3.1-i386

sun-javadb-common-10.4.1-3.1-i386

sun-javadb-client-10.4.1-3.1-i386

sun-javadb-docs-10.4.1-3.1-i386

sun-javadb-core-10.4.1-3.1-i386

sun-javadb-javadoc-10.4.1-3.1-i386

[mkpai@dhcp6-76 307274]$ grep jdk vwong.307274-232535/installed-rpms 

jdk-1.6.0_10-fcs-x86_64

Our packages are called java-1.6.0-openjdk- .

5. The customer has installed a 3rd party tomcat. We distribute tomcat5.

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.