Regex is blocking the program
I have the next regex
Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" /cid:45 423 /z:65 /a:23 /m:39 /t:45rt "
Dim str As String = "(^|\s)/p:""\w:(\\(\w+[\s]*\w+)+)+\\\w+.\w+""(\s|$)"
Dim ar As Integer
Dim getfile As New Regex(str)
Dim mgetfile As MatchCollection = getfile.Matches(orig开发者_JAVA百科en)
ar = mgetfile.Count
When I evaluate this it works, and gets the /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""
that basically is the path to a file.
But if I change the origen string to
Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""/cid:45 423 /z:65 /a:23 /m:39 /t:45rt "
Check that the end of the file is follow by "/cid:45" whitchs makes de pattern invalid, but instead of getting a mgetfile.count = 0 the program is block, if I make a debug I got a property evaluation failed.
Can you clean up the whole expression to just:
str = "/p:"".*?"""
The reason why your program hangs is catastrophic backtracking.
The parts of your regex (\w+\s*\w+)+
and \w+.\w+
allow so many permutations that the regex engine gets stuck in a near-infinite loop. RegexBuddy's debugger quits after 1000000 steps.
This only happens if the pattern can't match successfully, thereby prompting the regex engine to try any and all other permutation the pattern allows. Generally, repeating groups that contain repeating quantifiers is dangerous.
What are the real requirements? To match a path that only contains letters, numbers, underscores and backslashes? Or just a string between quotes? Perhaps you could shed some light on this...
Until then, I suggest the following:
"(?<=^|\s)/p:""\w:(\\[\w\s]++)+\.\w+""(?=\s|$)"
This cleans up a few things: (\\[\w\s]++)
matches a backslash, followed by any number of alphanumeric and space characters. Once they have been matched, the regex engine refuses to try a different permutation (this is achieved by using the possive quantifier ++
instead of just a +
.
After that, it matches a dot (your version would have matched any character), and a sequence of alphanumeric characters. Then a quote, and then it checks if a space or end-of-string follow. If not, the regex will fail, and fail quickly.
If you only want to match a string between quotes, then
"(?<=^|\s)/p:""[^""]+""(?=\s|$)"
is the best and fastest way.
Do you always know that there are two double quotes at the beginning and end? If so just do:
(^|\s)/p:""(.*?)""(.*$)
精彩评论