How to get the string from html file?
How should I search for and get a string from html files using c# in asp.net? This is the code:
private string getHtml(string key)
{
StreamReader f = new StreamReader("path");
string htmlTag = key;
string str = f.ReadToEnd().ToString();
Match m = Regex.Match(str, "<" + htmlTag + ">" + "(.*)" + "</" +
htmlTag + ">", RegexOptions.Singleline);
Console.WriteLine(m.Groups开发者_运维技巧[0]);
return str;
}
In your RegEx, try changing this:
"(.*)"
to this:
"([^<]*)"
So, instead of matching ANY character, you match any characters up to (but not including) the next less-than symbol.
You might also want to change this:
"</" + htmlTag + ">"
to this
"</ ?" + htmlTag + ">"
To allow for a space after the slash (you can ignore this second suggestion if you have full control over the HTML documents and know exactly how they were coded)
You could use Html Agility Pack, available here: http://htmlagilitypack.codeplex.com/
精彩评论