Remove html code in a text with no regular expression

2023-01-01 21:43 问答作者：

I am working in the indexation of feeds from Internet. I would like to remove tha html code which appears in some of them. I have used regular expression for the ones i have seen, but I would like to find some way to remove all of them automatically, because I don't know if I have seen all possible html code in my feeds. Is there any possibility? I add an example of things I would like to remove: /0831/oly_g_liukin_576.jpg" height="49" width="41" /> BEIJING - AUGUST 15: Nast开发者_如何学运维ia Liukin of the...

Use Jsoup utility, very good util to strip HTML code from a string

http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

In C# it could look something like (it will remove HTML Tags) this:

public static String RemoveHtmlTagsFromString(String source)
{
   char[] array = new char[source.Length];
   int arrayIndex = 0;
   bool inside = false;

   foreach (char let in source)
   {
       if (let == '<')
       {
           inside = true;
           continue;
       }

       if (let == '>')
       {
           inside = false;
           continue;
       }

       if (!inside)
       {
           array[arrayIndex] = let;
           arrayIndex++;
       }
   }
   return new string(array, 0, arrayIndex);
}

Remove html code in a text with no regular expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？