Trying to understand .NET regular expressions
I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.
(src|href)="\w+|(\w+/)+
The way I read this regular expression:
- Match exactly "src" or "href"
- Followed by ="
- Followed by match 1 or more word characters ([a-zA-Z0-9_]) or one or more of (one or more word characters followed by /)
This is meant to matc开发者_开发问答h something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.
Input:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"> <head>
Using this regular expression, with this input, there is one match.
org/1999/
Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?
What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\w+
OR (\w+/)+
of which second bit is being matched:
org/1999/
In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation |
refers to:
(src|href)="(\w+|(\w+/)+)
Btw I used Expresso to help work this out.
精彩评论