string remove htmls
I would like a regex to remove html tags and  , " etc from a string. The regex I have is to remove the html tags but not the others mentioned. I'm using .Net 4
Thanks
CODE:
String result = Regex.Replace(blogText, @"<[^>]*>"开发者_运维百科, String.Empty);
Don't use Regular Expressions, use the HTML Agility pack:
http://www.codeplex.com/htmlagilitypack
If you want to build on what you what you already created, you can change it to the following:
String result = Regex.Replace(blogText, @"<[^>]*>|&\w+", String.Empty);
It means...
- Either match tags as you defined...
- ...or match a
&
followed by at least one word character\w
-- as many as possible.
Neither of these two work in all nasty cases, but usually it does.
精彩评论