开发者

Trying to understand .NET regular expressions

I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.

(src|href)="\w+|(\w+/)+

The way I read this regular expression:

  1. Match exactly "src" or "href"
  2. Followed by ="
  3. Followed by match 1 or more word characters ([a-zA-Z0-9_]) or one or more of (one or more word characters followed by /)

This is meant to matc开发者_开发问答h something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.

Input:

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

Using this regular expression, with this input, there is one match.

org/1999/

Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?


What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\w+ OR (\w+/)+ of which second bit is being matched:

org/1999/

In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation | refers to:

(src|href)="(\w+|(\w+/)+)

Btw I used Expresso to help work this out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜