开发者

Is the entire Xss (stack space) used for each Java thread?

I am considering increasing the stack size to work around the StackOverflowError thrown by the regex library which does not appear to be on the plans for a fix.

Edi开发者_高级运维t: Solution

  • Stephen C's answer is probably the best answer to the problem, even if it is not an answer to the question. Although my string size was more than 4k already, I was still likely to eventually have the problem again during the lifetime of the product
  • aioobe's answer is the best answer to the actual question, perhaps not the actual problem.
  • Chris's answer is a very good idea. Edit: JRegex worked great!


Is the entire Xss (stack space) used for each Java thread?

According to this page, yes:

  • increase the stack size for all threads in your application, by including -Xssnnm in the Java command line (where nn is the number of megabytes of stack space per thread);

You can however choose a larger stack size for a specific thread using the Thread(ThreadGroup group, Runnable target, String name, long stackSize) constructor.

Allocates a new Thread object so that it has target as its run object, has the specified name as its name, belongs to the thread group referred to by group, and has the specified stack size.

Note however that (according to documentation) the effect of the stackSize parameter, if any, is highly platform dependent and that the value of the stackSize parameter may have no effect whatsoever on some platforms.


I think a better solution would be to rewrite the regex to avoid the problem. Or better still, replace it with some plain Java parsing code. Or maybe just reject strings larger than a certain length.

Bumping the stack size only puts off the problem. Now you can cope with 2000 or 4000 character input strings instead of 1000. But sooner or later you are likely to run into one that causes your expanded stacks to overflow.


What about using JRegex or Jakarta Regex instead?


If a regex in the form of (x|ab)* is causing stack overflows or other crashes in your regex engine (as mentioned in the madbean.com link in the original question) , here are a few tips to rewrite such a regular expression.

The regular expression (x|ab)* consists of a capturing group with two alternatives that are mutually exclusive. This regex can be optimized in 3 ways, depending on the features supported by your regex flavor. The java.util.regex flavor supports all 3.

The capturing group will store the text matched during its last iteration after a successful match, which is either x or ab. Since you probably don't care about the last iteration, you can tell the regex engine that you don't care and use a non-capturing group: (?:x|ab)*. How much of a speed increase this gives depends on how the regex engine keeps track of capturing groups.

The alternatives are mutually exclusive. If x matches, there is no point in trying to match ab at the same position. You can tell the regex engine that by using an atomic group: (?>x|ab)* Atomic groups are non-capturing, so this preserves the previous optimization.

Your repeated group (?>x|ab)* is not followed by anything that could match the same text as x or ab. Thus, the quantifier * can match as many iterations as possible, without ever having to give back to allow the remainder of the regex to match. You can tell the regex engine that by using a possessive quantifier: (?>x|ab)*+

Depending on how the java.util.regex engine handles backtracking and the suppression thereof via atomic groups and possessive quantifiers, any of these optimizations or the combination of them may very well avoid the stack overflow. Even if it doesn't and you choose to use a different regex engine, these techniques can still improve the performance of your regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜