开发者

Parse value from format constString1_Value_conString2

I need to parse an HTML string. I need to parse value from strings in this format:

title="Profil">VALUE</a>

Th开发者_JAVA百科e value can have any number of characters and it must end with </a>.


This can be very simple using an HTML parser and some XPath, which is probably a better choice than a regex. Here's an example using the HTML Agility Pack:

HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(@"http://jsbin.com/onoho3");
HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[@title='Profil']");
string myValue = node.InnerText;

Of course, you can also load the document from a string:

HtmlDocument doc =new HtmlDocument();
doc.LoadHtml(html);

If you do need a regex, a few possibilies are:
title="Profil">.*?</a>, title="Profil">[^<>]*</a>, or title="Profil">\w*</a>, depending exactly on what you need. Since you don't have any special characters, the regex is straightforward.


I would suggest to use HTML Agility Pack to processes HTML documents, it can be found from here:

http://htmlagilitypack.codeplex.com/


If you really have to use a RegEx and your text always must end with a dot (.) you can use this:

Regex valuePattern=new Regex( @"title=""Profil"">(.*\.)</a>");
string value = "";
Match match = valuePattern.Match(text);

if(match.Success)
    value = match.Groups[1].Value;

For parsing HTML I would suggest HtmlAgilityPack as well though, it makes many common parsing problems much easier.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜