开发者

Grabbing meta-tags and comments using HTML Agility Pack

I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems tha开发者_如何学运维t for such a powerful tool there is little noise about it on the Internet.

I am writing a simple method that will retrieve any given tag based on name:

public string[] GetTagsByName(string TagName, string Source) {
    ...
}

This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code:

...
// TODO: Clear Comments (can this be done or should I use RegEx?)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Source);
ArrayList tags = new ArrayList();
string xpath = "//" + TagName;
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes(xpath) {
    tags.Add(node.Text);
}
return (string[])tags.ToArray(typeof(String));

I would like to be able to first strip all comments from the HTML, then return the correct tag based on its name. If possible I'd also like to return certain meta-tags based on attribute, such as robot. I'm not that great with xpath, so any help with that would be good.

Any help would be much appreciated.


HtmlAgilityPack's HtmlDocument implements IXpathNavigable, thus it uses the standard .NET XPath engine. Any XPath 1.0 documentation will be applicable, especially if it talks about System.Xml.XPath.

"//comment()" finds all comments
"//meta" finds all "meta" elements

HtmlDocument was designed to look very much like XmlDocument, so examples and tutorials about it will be somewhat applicable.

Some MSDN links:

  • XPath Reference
  • Examples
  • XPath functions
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜