开发者

RegEx to strip BBCode tags from a string

I'm work开发者_开发百科ing on a feature that uses the JQuery MarkItUp! editor as a BBCode editor. I'm only allowing a small subset of BBCodes including the following:

[b]
[i]
[quote]
[quote=Mr Incredible]
[img]
[url]
[youtube]

I have a 1,500 character "Description" field that uses the editor, but I'm also planning to store a 150 character digest of the description with all of the BBCode stripped out.

I'm currently using a simple RegEx to do this in C#. It basically nukes embedded BBCodes in a string, but it leaves behind a lot of "noisy content" like the [img] URL or the [youtube] video ID that I'd also like to remove from the digest.

Here's my current RegEx:

  public static String StripBBCode(string bbCode)
  {
     string r = Regex.Replace(bbCode,
     @"\[(.*?)\]",
     String.Empty, RegexOptions.IgnoreCase);

     // Finally, replace all newlines with a space
     r = Regex.Replace(r,
     @"(\r\n|\n\r|\r|\n)+",
     @" ", RegexOptions.IgnoreCase);

     return r;
  }

If I run the following string through this function, I get the result shown below:

source

This is [b]bold[/b]. This is [i]italic[/i].

Here is an image:
[img]http://www.phatmac.com/Pics/Movies/Incredibles.jpg[/img]

Here is a link to [url=http://espn.go.com]ESPN[/url].

Here is a YouTube video:

[youtube]WJ0UkZ3W4FA[/youtube]

result

This is bold. This is italic. Here is an image: http://www.phatmac.com/Pics/Movies/Incredibles.jpg Here is a link to ESPN. Here is a YouTube video: WJ0UkZ3W4FA

Here's what I want to get back

This is bold. This is italic. Here is an image: Here is a link to ESPN. Here is a YouTube video:

How can I modify my StripBBCode() function to achieve this?

EDITED

The suggestion from David below in the first answer was correct.

Here's what I'm using now:

 string r = Regex.Replace(s,
    @"\[youtube\].*\[\/youtube\]",
    String.Empty, RegexOptions.IgnoreCase);

 r = Regex.Replace(r,
    @"\[img\].*\[\/img\]",
    String.Empty, RegexOptions.IgnoreCase);


You've got several tags that you want the content removed, and the rest where you only want the tags removed.

Do replace of [img].*[/img] with string.empty, and [youtube].*[/youtube], and whatever else you need the contents removed, then do your removal of [.*].

Edit:

I'm not a regex expert either, but I think @"\[img\].*?\[/img\]" is what you want. I don't think you need the parentheses in @"\[(.*?)\]", I think in this context parentheses means to save the matched text so you can match it again with \1.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜