Calculating regex efficiency
How would you go about calculating / finding the number of operations a regex takes to match over a given string? I'd like to develop a program that would allow you to rank regexs in order of efficiency.
Also, is it possible to break out of a regex if the number of开发者_高级运维 operations exceeds a given threshold? I'm hoping to turn this into a web app, so I don't want users entering regexes that could potentially kill the server (if that's even possible).
Many thanks.
Edit: Just to clarify, I'm referring to the superset of plain regexes that includes backtracking (which is therefore non-linear).
The way to find out how many operations it will take to parse a given string is to parse it and count the number of operations. You could do somewhat limited static analysis, but a definitive answer would be tantamount to solving the halting problem.
Trying to rank expressions for any input is even more complex. Take the expression A[0-9]+
- The string "A999" will match, and take roughly O(n) time.
- The string "B943" will immediately fail, taking O(1) time.
A regular expression parser is fundamentally just a program. It is almost always not possible to say one program is faster than another in general, only for specific input.
You could try to use static analysis based on some understanding of what the input might be. For example, an expression which can immediately eliminate a large portion of the common inputs might be faster than one which doesn't. I would say that the only way to do this is to also accept a dataset of expressions with a similar distribution to those being parsed and either do benchmarks [easy] or analysis [hard] using that data.
精彩评论