开发者

ASP.NET Regex Control

I am doing string matching in ASP.NET C#, I have to convert HTML and .aspx page into plain text format (like browser view text), in that HT开发者_开发知识库ML page I'm having <style>, <javascript> and etc. I'm using Regex.Replace method.

//Removing JavaScripts
str = Regex.Replace(str, "<script.*?>.*?</script>", "", RegexOptions.Singleline);

//For Link Title
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
Regex ex = new Regex(regex, RegexOptions.IgnoreCase);
string title = ex.Match(str).Value.Trim();

//Removing Html Tags
str = System.Text.RegularExpressions.Regex.Replace(str, "<.*?>", "");
str = str.Replace("\r\n", "");


You can't use Regex to strip HTML. You need an HTML parsing library. I've used the HTML Agility Pack successfully in the past.

http://htmlagilitypack.codeplex.com/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜