High CPU due to heavy use of String.replaceAll

Solution Verified - Updated

Environment

  • JBoss Enterprise Application Platform (EAP)

Issue

  • We see high CPU usage on our JBoss server and CPU/thread dump data shows that the CPU is consumed by several threads in calls to String.replaceAll(), for example:
"ajp-127.0.0.1-8009-1" daemon prio=10 tid=0x00000000476fd000 nid=0xa73 runnable [0x000000004c1ab000]
   java.lang.Thread.State: RUNNABLE
	at java.util.regex.Pattern.atom(Pattern.java:1980)
	at java.util.regex.Pattern.sequence(Pattern.java:1885)
	at java.util.regex.Pattern.expr(Pattern.java:1752)
	at java.util.regex.Pattern.compile(Pattern.java:1460)
	at java.util.regex.Pattern. (Pattern.java:1133)
	at java.util.regex.Pattern.compile(Pattern.java:823)
	at java.lang.String.replaceAll(String.java:2189)
        ...

"ajp-10.133.72.102-8009-2" daemon prio=10 tid=0x0000000046406000 nid=0xa74 runnable [0x000000004c2ac000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.String.length(String.java:651)
	at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
	at java.util.regex.Matcher.reset(Matcher.java:291)
	at java.util.regex.Matcher. (Matcher.java:211)
	at java.util.regex.Pattern.matcher(Pattern.java:888)
	at java.lang.String.replaceAll(String.java:2189)
        ...

Resolution

  • If you're using the same regex repeatedly through replaceAll, you'd likely be better off building the Pattern/Matcher objects themselves and reusing them instead, that way they do not have to be reinitialized and compiled repeatedly by String.replaceAll().
  • Matchers are not thread safe though, so if you reuse a Matcher across threads, you will likely want to synchronize access to it and call Matcher.reset() when a thread is finished with it.

Root Cause

  • String.replaceAll() is very inefficient and not recommended for repeated use for the same pattern over and over. Each time replaceAll is called, a new regex Pattern is created and compiled, adding a lot of overhead each time.

Diagnostic Steps

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.