Stripping script tags from HTML input
public static string MakeWebSafe(this string x) {
const string RegexRemove = @"(<\s*script[^>]*>)|(<\s*/\s*script[^>]*>)";
return Regex.Replace(x, RegexRemove, string.Empty, RegexOptions.IgnoreCase);
}
Is there any reason this implementation isn't good enough. Can you break it? Is there anything I haven't considered? If you use or have used something different, what are its advantages?
I'm aware this leaves the body of the script in the text, but that'开发者_如何学Gos okay for this project.
UPDATE
Don't do the above! I went with this in the end: HTML Agility Pack strip tags NOT IN whitelist.
Have you considered this kind of scenario??
<scri<script>pt type="text/javascript">
causehavoc();
</scr</script>ipt>
The best thing to do is remove all tags, encode things, or use bbcode
Yes, your RegEx can be circumvented by unicode encoding the script tags. I would suggest you look to more robust libraries when it comes to security. Take a look at Microsoft Web Protection Library
精彩评论