开发者

Measure Regex Performance [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

Is there an easy way t开发者_JAVA技巧o measure the performance of all regular expressions within a java application?


Most IDE's have a profiling option to show you which operations are called how often and how much time they take. If you write your application so that all Regex processing is done inside a helper method, you would see how that method performs in the profile.


Is there an easy way to measure the performance of all regular expressions with an application?

Literally impossible. There are an infinite number of possible regexes.

Is there an easy way to measure the performance of all regular expressions used in an application?

There is no easy way. But you could try the following approaches:

  • Write a static analyser to identify all places where Pattern.compile(...) is called, and extract the literal string containing the regex:

    • Regexes could be created dynamically.

    • The actual performance depends on the input strings as well as the regex.

  • Run a generic profiler and see which calls to the matcher are taking lots of time.

    • You can identify the statements which are the bottlenecks, but this doesn't tell you the input string or (in all cases) the regex.
  • Hack the relevant methods of Pattern and Matcher to log timing information and capture regexes and inputs.

    • Involves modifying system classes - bad idea.

    • You could possibly use AOP or bytecode modification - cleaner, but more complicated.

  • Create your own wrapper for Pattern and Matcher to do the above, and use them in your code place of the standard classes.

    • Hard to find / change all occurrences; e.g. in 3rd-party libraries or in sysme classes like String.split(...).


If you want to compare the performance of regular expressions given various inputs, you want to measure the CPU time required to perform matches against the regular expression (as opposed to "user time").

Here's what I would be inclined to do. Write a JUnit test for each regular expression/input pair. You can use JUnit assertions to verify that your regular expressions are matching what you intend and nothing you don't. You can then add extra statements to your test cases to measure the CPU time consumed for each input-regex pair. Some brief research shows that one way to measure CPU time in Java is to use a ThreadMXBean (an instance can be obtained obtained by calling ManagementFactory.getThreadMXBean()). The interface includes methods for checking whether CPU time measurements are supported and for getting the CPU time. You simply want to get the CPU time immediately before and immediately after each match, and the difference of course is the amount of CPU time the match required.

Hope this helps! Maybe someone else knows of a better interface/library for measuring CPU time, because it seems ugly to use ThreadMXBean in a unit test. Note also that it's generally considered poor practice for your unit tests to produce output, so you might consider removing any print statements once you've finished investigating the performance of your regexes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜