G1 Collector Tuning
Environment
- OpenJDK
- Oracle JDK
Issue
- GC logging shows Long pause times using the G1 collector.
- GC logging shows bad throughput using the G1 collector.
Resolution
Disclaimer: Links contained herein to an external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
The G1 collector is a regionalized and generational garbage collector (correlates the generations with certain memory regions). It was first supported in JDK7u4 and became the default collector in JDK 11.
Following are general strategies for improving garbage collection performance (throughput and/or max pause time):
- Increase the heap size.
- Increase the number of collector threads for concurrent collectors.
- Move object collection to more efficient collectors (e.g. Young vs. Old, Concurrent vs. Serial).
G1 Full GC
The following are tuning options for avoiding G1 Full GC serial collections:
- Increase PermGen/Metaspace size. For example:
-XX:MaxMetaspaceSize=512m. Or better off, do not set it at all and leave for the JVM to calculate it, JDK 8 Metaspace tuning for JBoss EAP. - Increase
G1ReservePercent(default 10%) and the total heap by the same percentage to increase theto-spacesize to avoidto-space-exhaustion. This is the amount of heap space reserved to avoid promotion failure. If you have a 10GB heap and -XX:G1ReservePercent=10, the heap max space is effectively 9GB. For example:-XX:G1ReservePercent=15. - Decrease
InitiatingHeapOccupancyPercentto start the marking cycle earlier to avoid the G1 collector being swamped by more allocations than it can concurrently collect. For example:-XX:InitiatingHeapOccupancyPercent=40. - Increase
ConcGCThreadsto increase the number of parallel marking threads to avoid the G1 collector being swamped by more allocations than it can concurrently collect if you have sufficient cpu/cores. For example:-XX:ConcGCThreads=4. Note: The default isConcGCThreads= 1/4 *ParallelGCThreads. SettingConcGCThreadsand/orParallelGCThreadstoo high will cause threading contention and result in very poor performance, so be sure your settings make sense for the available cpu/cores. - Increase
G1MixedGCLiveThresholdPercentto allow G1 Young collector to calculate reclaimable space from more regions and allow the G1 Mixed collector to collect from those regions. For example:-XX:+UnlockExperimentalVMOptions -XX:G1MixedGCLiveThresholdPercent=85. Note: the default was changed from 65 to 85 in Content from www.oracle.com is not included.JDK8 update 40 to reduce the need for G1 Full GCs. - Lower
G1HeapWastePercentto increase the candidate regions for Mixed GC. For example:-XX:G1HeapWastePercent=5. Note: the default was changed from 10 to 5 in Content from www.oracle.com is not included.JDK8 update 40 to reduce the need for G1 Full GCs. - If using JDK7, upgrade to the latest release to avoid issues such as the following: G1 Full GC for no apparent reason .
G1 mixed collections
Root Cause
The G1 collector reclaims heap space in multiples ways. Performance goals (pause time or throughput) are generally met by moving collection from a more expensive to a less expensive collection. The following is a list of collection types from least to most expensive:
- Reclaim from Eden space with the G1 Young collector
- Reclaim from Survivor space with the G1 Young collector
- Reclaim from Old space with the G1 Mixed collector
- Reclaim from Young and Old spaces with the G1 Full GC collector
G1GC Overview
First of all, G1GC is an adaptive garbage collector and the defaults for efficiently working without modification and most of the scenarios it should work as expected, bottom line: G1GC default settings will maximize the reduction of latency (defined as Pause Time Goals) instead of higher throughput.
G1 divides the heap into heap regions (default of 2048 regions, tuned by G1HeapRegionSize depending on the minimum heap size and each region size is power of 2 MB, with the minimum of 1M, and maximum of 32M (so regions of 1MB, 2MB, 4MB, 8MB, 16MB, 32MB). Each region will correspond to a generation, so not continuous allocation. Then each phase will clean a certain region of the heap (off-heap for Metaspace) that is the collection set. The regions will be either: Free, Eden, Survivor, or Tenured
Phases
The G1 Full GC collector is a serial (single-threaded) collector, which means it will take a very long time to collect a large heap. If the G1 collector is running optimally, there will not be any G1 Full GC collections. G1 Full GCs happen when the PermGen/Metaspace fills up, when the old space fills up with humongous objects (they are allocated directly in the old space, not in the young space), or when there are more allocations than the G1 collector can concurrently collect (the heap fills up before marking can complete), see solution.
Every G1 pause performs an evacuation and preforms an object copy moving live objects from the eden space to the survivor and old regions. This is considered a young collection. The XX:MaxGCPauseMillis a.k.a the pause time target determines how many eden and survivor regions are included in each Young Collection (the Collection Set or CSet).
Concurrent collections
Concurrent collections of the old region can also be scheduled alongside a Young GC, this is referred to as a Mixed GC. Concurrent collection is determined at the end of a Young GC cycle by the InitiatingHeapOccupancyPercent. If heap occupancy is > InitiatingHeapOccupancyPercent (default 45%), G1 will initiate a Young GC initial-mark which will to begin (concurrent-mark-start) the concurrent marking/collection cycle. This will find and mark all live objects in the old regions. In addition, it will also perform concurrent-cleanup, which will concurrently collect any old regions that are 100% empty. Depending on length of time, it is normal to see multiple Young GC collections happening in parallel. Once concurrent-mark-end is reached, G1 will order the old regions by size and eliminate any region which has a live occupancy larger than G1MixedGCLiveThresholdPercent (default 65% in JDK7, raised to 85% in Content from www.oracle.com is not included.JDK8 update 40). All remaining old regions are now added to the list of candidate regions for collection. Now, the total amount of reclaimable heap is calculated based on the old regions in the Collectible Set. If the total percentage of reclaimable heap is > G1HeapWastePercent (default 10%, lowered to 5 in Content from www.oracle.com is not included.JDK8 update 40), a G1 Mixed collection is initiated. The G1 collector will continue Mixed collections until the reclaimable percentage < G1HeapWastePercent up to a maximum of G1MixedGCCountTarget.
Humongous allocation
A humongous allocation is an allocation larger than one-half G1HeapRegionSize (default = heap/2048). Humongous allocations are allocated directly in the Old space in regions that can only be collected by a G1 Full GC serial collection. For additional information, see Why does G1GC log "humongous allocation request failed"?
Diagnostic Steps
Analyze the GC logging to determine overall throughput, longest pause time, and any localized bottlenecks (periods when more time is spent doing garbage collection than running application threads). Ensure that trigger information is included in the GC logging. This is the default with JDK8+, but with JDK7 you will need to enable it with the -XX:-PrintGCCause option.
Below are described six examples with complete log and analysis:
- G1 Full GC when the heap is not full
- G1 Young collection(s) triggered by
to-space-exhausted - Humongous allocations
- G1 Young collection when heap is full that does not trigger a mixed collection
- High percentage of Mixed GCs not starting or not continuing
- Long period of time to complete concurrent marking:
Long Pause Time
Check the type of collection of the longest pause.
Example 1
A G1 Full GC when the heap is not full:
2016-02-10T06:54:49.207-0500: 29947.711: [Full GC 19G->8199M(26G), 20.0926030 secs]
[Eden: 0.0B(1328.0M)->0.0B(14.8G) Survivors: 0.0B->0.0B Heap: 19.0G(26.0G)->8199.1M(26.0G)]
[Times: user=27.99 sys=0.04, real=20.09 secs]
Analysis:
- Old space is not full when when the G1 Full GC is initiated.
- The full collection reclaims almost the entire heap (19.0G --> 8199.1M).
- The collection could have been caused by PermGen/Metaspace. Check the
-XX:MaxMetaspaceSizesetting.
*Increase PermGen/Metaspace size. For example: `-XX:MaxMetaspaceSize=512m.
Example 2
G1 Young collection(s) triggered by to-space-exhausted followed by a G1 Full GC collection:
2016-02-18T01:24:41.704-0500: 42413.202: [GC pause (young) (to-space exhausted), 0.2899510 secs]
[Parallel Time: 251.3 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 42413202.5, Avg: 42413202.5, Max: 42413202.5, Diff: 0.0]
[Ext Root Scanning (ms): Min: 59.5, Avg: 64.4, Max: 76.2, Diff: 16.7, Sum: 257.7]
[SATB Filtering (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Update RS (ms): Min: 39.0, Avg: 45.7, Max: 56.3, Diff: 17.3, Sum: 182.9]
[Processed Buffers: Min: 62, Avg: 70.5, Max: 87, Diff: 25, Sum: 282]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 132.4, Avg: 140.9, Max: 148.5, Diff: 16.2, Sum: 563.4]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.5]
[GC Worker Total (ms): Min: 251.1, Avg: 251.2, Max: 251.2, Diff: 0.0, Sum: 1004.6]
[GC Worker End (ms): Min: 42413453.6, Avg: 42413453.7, Max: 42413453.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Migration: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 38.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 10.7 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 432.0M(1296.0M)->0.0B(1328.0M) Survivors: 32.0M->0.0B Heap: 25.7G(26.0G)->25.6G(26.0G)]
[Times: user=0.73 sys=0.02, real=0.29 secs]
2016-02-18T01:24:42.009-0500: 42413.507: [GC pause (young) (to-space exhausted), 0.1166990 secs]
[Parallel Time: 103.2 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 42413507.3, Avg: 42413507.3, Max: 42413507.3, Diff: 0.1]
[Ext Root Scanning (ms): Min: 60.6, Avg: 65.6, Max: 79.9, Diff: 19.3, Sum: 262.3]
[SATB Filtering (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Update RS (ms): Min: 17.7, Avg: 30.1, Max: 34.4, Diff: 16.7, Sum: 120.3]
[Processed Buffers: Min: 55, Avg: 64.2, Max: 76, Diff: 21, Sum: 257]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 5.3, Avg: 7.3, Max: 8.0, Diff: 2.7, Sum: 29.1]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.4]
[GC Worker Total (ms): Min: 103.0, Avg: 103.1, Max: 103.1, Diff: 0.1, Sum: 412.3]
[GC Worker End (ms): Min: 42413610.3, Avg: 42413610.4, Max: 42413610.4, Diff: 0.0]
[Code Root Fixup: 0.2 ms]
[Code Root Migration: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 13.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 10.5 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 80.0M(1328.0M)->0.0B(1328.0M) Survivors: 0.0B->0.0B Heap: 25.7G(26.0G)->25.7G(26.0G)]
[Times: user=0.45 sys=0.00, real=0.11 secs]
2016-02-18T01:24:42.126-0500: 42413.624: [GC pause (young), 0.0987600 secs]
[Parallel Time: 86.1 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 42413624.3, Avg: 42413624.3, Max: 42413624.3, Diff: 0.0]
[Ext Root Scanning (ms): Min: 62.2, Avg: 69.4, Max: 81.0, Diff: 18.8, Sum: 277.5]
[SATB Filtering (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Update RS (ms): Min: 0.0, Avg: 0.7, Max: 1.6, Diff: 1.6, Sum: 2.8]
[Processed Buffers: Min: 0, Avg: 4.2, Max: 10, Diff: 10, Sum: 17]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 2.8, Avg: 3.5, Max: 4.9, Diff: 2.0, Sum: 14.0]
[Termination (ms): Min: 0.0, Avg: 12.3, Max: 19.3, Diff: 19.3, Sum: 49.1]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.4]
[GC Worker Total (ms): Min: 85.9, Avg: 86.0, Max: 86.0, Diff: 0.1, Sum: 343.9]
[GC Worker End (ms): Min: 42413710.2, Avg: 42413710.2, Max: 42413710.3, Diff: 0.0]
[Code Root Fixup: 0.3 ms]
[Code Root Migration: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 12.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 12.2 ms]
[Ref Enq: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 0.0B(1328.0M)->0.0B(1328.0M) Survivors: 0.0B->0.0B Heap: 25.7G(26.0G)->25.7G(26.0G)]
[Times: user=0.39 sys=0.00, real=0.10 secs]
2016-02-18T01:24:42.225-0500: 42413.724: [Full GC 25G->7547M(26G), 26.5290110 secs]
[Eden: 0.0B(1328.0M)->0.0B(15.6G) Survivors: 0.0B->0.0B Heap: 25.7G(26.0G)->7547.3M(26.0G)]
[Times: user=33.99 sys=0.05, real=26.53 secs]
Analysis:
- All young space is being reclaimed.
- Old space is full.
- The full collection reclaims almost the entire heap (25.7G --> 9471.1M).
to-space exhaustedindicates evacuation failure. The heap is too full to complete the young collection, so the JVM invokes a G1 Full GC collection as a drastic measure to reclaim space..- Increase
G1ReservePercent(default 10%) and the total heap by the same percentage to increase theto-spacesize. For example: -XX:G1ReservePercent=15`. - Decrease
InitiatingHeapOccupancyPercentto start the marking cycle earlier. For example:-XX:InitiatingHeapOccupancyPercent=40. - Increase
ConcGCThreadsto increase the number of parallel marking threads if you have sufficient cpu/cores. For example:-XX:ConcGCThreads=4.
Example 3
Logging with -XX:+PrintAdaptiveSizePolicy showing humongous allocations:
16251.713: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 12557746176 bytes, allocation request: 5255432 bytes, threshold: 12562779330 bytes (45.00 %), source: concurrent humongous allocation]
17341.743: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 22598909952 bytes, allocation request: 16749032 bytes, threshold: 12562779330 bytes (45.00 %), source: concurrent humongous allocation]
Analysis:
- A humongous allocation is being made, which can only be collected by a G1 Full GC collection.
- Increase
G1HeapRegionSizeto avoid humongous allocations. Use settings that are powers of 2. For example:-XX:G1HeapRegionSize=16m. Note the maxG1HeapRegionSizevalue is 32m.
Example 4
G1 Young collection when heap is full that does not trigger a mixed collection:
2016-02-18T01:24:35.204-0500: 42406.702: [GC pause (young), 0.1535920 secs]
[Parallel Time: 141.4 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 42406702.2, Avg: 42406702.3, Max: 42406702.3, Diff: 0.1]
[Ext Root Scanning (ms): Min: 61.9, Avg: 74.0, Max: 90.1, Diff: 28.3, Sum: 295.8]
[SATB Filtering (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Update RS (ms): Min: 28.3, Avg: 45.1, Max: 58.6, Diff: 30.3, Sum: 180.3]
[Processed Buffers: Min: 12, Avg: 23.5, Max: 32, Diff: 20, Sum: 94]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 20.6, Avg: 22.1, Max: 23.4, Diff: 2.8, Sum: 88.3]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.5]
[GC Worker Total (ms): Min: 141.3, Avg: 141.3, Max: 141.3, Diff: 0.0, Sum: 565.2]
[GC Worker End (ms): Min: 42406843.5, Avg: 42406843.6, Max: 42406843.6, Diff: 0.0]
[Code Root Fixup: 0.2 ms]
[Code Root Migration: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 11.7 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 10.8 ms]
[Ref Enq: 0.2 ms]
[Free CSet: 0.2 ms]
[Eden: 1280.0M(1280.0M)->0.0B(1280.0M) Survivors: 48.0M->48.0M Heap: 25.6G(26.0G)->24.4G(26.0G)]
[Times: user=0.61 sys=0.00, real=0.16 secs]
Analysis:
- A G1 Mixed collection can only happen at the end of a marking cycle.
- Decrease
InitiatingHeapOccupancyPercentto start the marking cycle earlier. For example:-XX:InitiatingHeapOccupancyPercent=40.
Example 5
Logging with -XX:+PrintAdaptiveSizePolicy showing high percentage of Mixed GCs not starting or not continuing due to the -XX:G1HeapWastePercent=10:
62683.639: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 332 regions, reclaimable: 1661935960 bytes (5.95 %), threshold: 10.00 %]
72265.494: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 574 regions, reclaimable: 2790845424 bytes (10.00 %), threshold: 10.00 %]
Analysis:
- After GC concurrent-mark-end, you need a significant number of
do not start mixed GCs, reason: reclaimable percentage not over threshold. - After
start mixed GCs, reason: candidate old regions availableyou see a significant number ofdo not continue mixed GCs, reason: reclaimable percentage not over threshold. - Decrease
G1HeapWastePercentto reduce the waste percentage during mixed collections. For example:-XX:G1HeapWastePercent=5
Example 6
Logging with -XX:+PrintAdaptiveSizePolicy showing a long period of time to complete concurrent marking:
2016-02-18T01:19:19.963-0500: 42091.462: [GC concurrent-mark-start]
2016-02-18T01:20:51.118-0500: 42182.617: [GC concurrent-mark-end, 91.1547470 secs]
Analysis:
- You are not seeing a significant number of Mixed Garbage Collections because concurrent marking is taking too long.
- You can see
GC concurrent-mark-endtaking dozens of seconds to complete over an extended period of time. - Concurrent marking is completed by
-XX:ConcGCThreads, which defaults to 25% (or one) of the number of-XX:ParallelGCThreads. The number of-XX:ParallelGCThreadsdefaults to one thread per logical processor up to eight. With four logical processors, there are fourParallelGCThreadsand oneConcGCThreads. With eight logical processors, there are eightParallelGCThreadsand twoConcGCThreads. - Increase the number of concurrent GC threads. For example:
-XX:ConcGCThreads=2.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.