开发者

Regular Expression: Getting url value from hyperlink

I have a string that contains html. I want to get all href value from hyperlinks using C#.

T开发者_开发问答arget String

<a href="~/abc/cde" rel="new">Link1</a>

<a href="~/abc/ghq">Link2</a>

I want to get values "~/abc/cde" and "~/abc/ghq"


Use the HTML Agility Pack for parsing HTML. Right on their examples page they have an example of parsing some HTML for the href values:

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];

    // Do stuff with attribute value
 }


Using a regex to parse HTML is not advisable (think of text in comments etc.).

That said, the following regex should do the trick, and also gives you the link HTML in the tag if desired:

Regex regex = new Regex(@"\<a\s[^\<\>]*?href=(?<quote>['""])(?<href>((?!\k<quote>).)*)\k<quote>[^\>]*\>(?<linkHtml>((?!\</a\s*\>).)*)\</a\s*\>", RegexOptions.IgnoreCase|RegexOptions.ExplicitCapture);
for (Match match = regex.Match(inputHtml); match.Success; match=match.NextMatch()) {
  Console.WriteLine(match.Groups["href"]);
}


Here is a snippet of the regex (use IgnoreWhitespace option):

(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
# -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!

This will give you every tag and you can filter out what is needed and target the attribute you want.

I've written more about this in my blog (C# Regex Linq: Extract an Html Node with Attributes of Varying Types).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜