c# rendering html into text
i want to be able to take html code and render plain text out of it.
ano开发者_如何学运维ther words this would be my input
<h3>some text</h3>
i want the result to look like this:
some text
how would i do it?
I would suggest trying the HTML Agility Pack for .NET:
Html Agility Pack - Codeplex
Attemtping to parse through HTML with anything else is, for the most part, unreliable.
Whatever you do, DON'T TRY TO PARSE HTML WITH REGEX!
Use regex.
String result = Regex.Replace(your_text_goes_here, @"<[^>]*>", String.Empty);
You would need to use some form of HTML parser. You could use an existing Regex or build your own. However, they aren't always 100% reliable. I would suggest using a 3rd party utility like HtmlAgilityPack (I have used this one and would recommend it)
Poor Man's HTML Parser
string s =
@"
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
";
foreach (var item in s.Split(new char[]{'<'}))
{
int x = item.IndexOf('>');
if (x != -1)
{
Console.WriteLine(item.Substring(x).Trim('>'));
}
}
精彩评论