开发者

Regex is blocking the program

I have the next regex

Dim origen As String = "  /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" /cid:45    423 /z:65 /a:23  /m:39 /t:45rt "

Dim str As String = "(^|\s)/p:""\w:(\\(\w+[\s]*\w+)+)+\\\w+.\w+""(\s|$)"
Dim ar As Integer

Dim getfile As New Regex(str)
Dim mgetfile As MatchCollection = getfile.Matches(orig开发者_JAVA百科en)
ar = mgetfile.Count

When I evaluate this it works, and gets the /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" that basically is the path to a file.

But if I change the origen string to

Dim origen As String = "  /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""/cid:45    423 /z:65 /a:23  /m:39 /t:45rt "

Check that the end of the file is follow by "/cid:45" whitchs makes de pattern invalid, but instead of getting a mgetfile.count = 0 the program is block, if I make a debug I got a property evaluation failed.


Can you clean up the whole expression to just:

str = "/p:"".*?"""


The reason why your program hangs is catastrophic backtracking.

The parts of your regex (\w+\s*\w+)+ and \w+.\w+ allow so many permutations that the regex engine gets stuck in a near-infinite loop. RegexBuddy's debugger quits after 1000000 steps.

This only happens if the pattern can't match successfully, thereby prompting the regex engine to try any and all other permutation the pattern allows. Generally, repeating groups that contain repeating quantifiers is dangerous.

What are the real requirements? To match a path that only contains letters, numbers, underscores and backslashes? Or just a string between quotes? Perhaps you could shed some light on this...

Until then, I suggest the following:

"(?<=^|\s)/p:""\w:(\\[\w\s]++)+\.\w+""(?=\s|$)"

This cleans up a few things: (\\[\w\s]++) matches a backslash, followed by any number of alphanumeric and space characters. Once they have been matched, the regex engine refuses to try a different permutation (this is achieved by using the possive quantifier ++ instead of just a +.

After that, it matches a dot (your version would have matched any character), and a sequence of alphanumeric characters. Then a quote, and then it checks if a space or end-of-string follow. If not, the regex will fail, and fail quickly.

If you only want to match a string between quotes, then

"(?<=^|\s)/p:""[^""]+""(?=\s|$)"

is the best and fastest way.


Do you always know that there are two double quotes at the beginning and end? If so just do:

(^|\s)/p:""(.*?)""(.*$)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜