RegEx to strip BBCode tags from a string
I'm work开发者_开发百科ing on a feature that uses the JQuery MarkItUp! editor as a BBCode editor. I'm only allowing a small subset of BBCodes including the following:
[b]
[i]
[quote]
[quote=Mr Incredible]
[img]
[url]
[youtube]
I have a 1,500 character "Description" field that uses the editor, but I'm also planning to store a 150 character digest of the description with all of the BBCode stripped out.
I'm currently using a simple RegEx to do this in C#. It basically nukes embedded BBCodes in a string, but it leaves behind a lot of "noisy content" like the [img] URL or the [youtube] video ID that I'd also like to remove from the digest.
Here's my current RegEx:
public static String StripBBCode(string bbCode)
{
string r = Regex.Replace(bbCode,
@"\[(.*?)\]",
String.Empty, RegexOptions.IgnoreCase);
// Finally, replace all newlines with a space
r = Regex.Replace(r,
@"(\r\n|\n\r|\r|\n)+",
@" ", RegexOptions.IgnoreCase);
return r;
}
If I run the following string through this function, I get the result shown below:
source
This is [b]bold[/b]. This is [i]italic[/i].
Here is an image:
[img]http://www.phatmac.com/Pics/Movies/Incredibles.jpg[/img]
Here is a link to [url=http://espn.go.com]ESPN[/url].
Here is a YouTube video:
[youtube]WJ0UkZ3W4FA[/youtube]
result
This is bold. This is italic. Here is an image: http://www.phatmac.com/Pics/Movies/Incredibles.jpg Here is a link to ESPN. Here is a YouTube video: WJ0UkZ3W4FA
Here's what I want to get back
This is bold. This is italic. Here is an image: Here is a link to ESPN. Here is a YouTube video:
How can I modify my StripBBCode() function to achieve this?
EDITED
The suggestion from David below in the first answer was correct.
Here's what I'm using now:
string r = Regex.Replace(s,
@"\[youtube\].*\[\/youtube\]",
String.Empty, RegexOptions.IgnoreCase);
r = Regex.Replace(r,
@"\[img\].*\[\/img\]",
String.Empty, RegexOptions.IgnoreCase);
You've got several tags that you want the content removed, and the rest where you only want the tags removed.
Do replace of [img].*[/img]
with string.empty, and [youtube].*[/youtube]
, and whatever else you need the contents removed, then do your removal of [.*]
.
Edit:
I'm not a regex expert either, but I think @"\[img\].*?\[/img\]"
is what you want. I don't think you need the parentheses in @"\[(.*?)\]"
, I think in this context parentheses means to save the matched text so you can match it again with \1
.
精彩评论