Best practice for preventing saving malicious client script in HTML
We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script>
tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script>
elements.
Is this all I need to do? Are all other tags other than the <script>
tag safe? If you were an attacker, what else would you attemp开发者_StackOverflow中文版t to do?
Any best practices I need to follow?
EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?
XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.
Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.
This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.
For more information and guidance, see this OWASP article.
Have a look at this page:
http://ha.ckers.org/xss.html
to get an idea of different XSS attacks that somebody may try.
There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:
- Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like
<scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt>
will get past you since after your remove<script>
tags from the string, you'll have created a new one. - Strip the use of the
javascript:
protocol inhref
andsrc
attributes. - Strip embedded event handler attributes like
onmouseover/out
,onclick
,onkeypress
, etc. - White lists are safer than black lists. Only allow tags and attributes that you know are safe.
- Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.
Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.
Removing only the <script>
tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.
I'm not sure what the best options are in ASP.NET, but this might shed some light: http://msdn.microsoft.com/en-us/library/ms998274.aspx
This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript:
URLs, sometimes CSS, object
and iframe
tags, etc).
The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.
精彩评论