Parse value from format constString1_Value_conString2
I need to parse an HTML string. I need to parse value
from strings in this format:
title="Profil">VALUE</a>
Th开发者_JAVA百科e value can have any number of characters and it must end with </a>
.
This can be very simple using an HTML parser and some XPath, which is probably a better choice than a regex. Here's an example using the HTML Agility Pack:
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(@"http://jsbin.com/onoho3");
HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[@title='Profil']");
string myValue = node.InnerText;
Of course, you can also load the document from a string:
HtmlDocument doc =new HtmlDocument();
doc.LoadHtml(html);
If you do need a regex, a few possibilies are:
title="Profil">.*?</a>
, title="Profil">[^<>]*</a>
, or title="Profil">\w*</a>
, depending exactly on what you need. Since you don't have any special characters, the regex is straightforward.
I would suggest to use HTML Agility Pack to processes HTML documents, it can be found from here:
http://htmlagilitypack.codeplex.com/
If you really have to use a RegEx and your text always must end with a dot (.) you can use this:
Regex valuePattern=new Regex( @"title=""Profil"">(.*\.)</a>");
string value = "";
Match match = valuePattern.Match(text);
if(match.Success)
value = match.Groups[1].Value;
For parsing HTML I would suggest HtmlAgilityPack as well though, it makes many common parsing problems much easier.
精彩评论