ASP.NET Regex Control
I am doing string matching in ASP.NET C#, I have to convert HTML and .aspx
page into plain text format (like browser view text), in that HT开发者_开发知识库ML page I'm having <style>
, <javascript>
and etc. I'm using Regex.Replace
method.
//Removing JavaScripts
str = Regex.Replace(str, "<script.*?>.*?</script>", "", RegexOptions.Singleline);
//For Link Title
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
Regex ex = new Regex(regex, RegexOptions.IgnoreCase);
string title = ex.Match(str).Value.Trim();
//Removing Html Tags
str = System.Text.RegularExpressions.Regex.Replace(str, "<.*?>", "");
str = str.Replace("\r\n", "");
You can't use Regex to strip HTML. You need an HTML parsing library. I've used the HTML Agility Pack successfully in the past.
http://htmlagilitypack.codeplex.com/
精彩评论