Cleaning an HTML string saving some tags and attributes
After I implemented my sanitize functions (according to requested specifics), my boss decided to change the accepted input. Now he wants to keep some specific tag and its attributes. I suggested to implement a BBCode-like language which is safer imho but he doesn't want to because it would be to much work.
This ti开发者_JAVA百科me I would like to keep it simple so I will not kill him the next time he asks me to change again this thing. And I know he will.
Is it enough to use first the strip_tags
with the tag parameter to preserve and then htmlentities
?
strip_tags
does not necessarily result in safe content. strip_tags
followed by htmlentities
would be safe, in that anything HTML-encoded is safe, but it doesn't make any sense.
Either the user is inputting plain text, in which case it should be output using htmlspecialchars
(in preference to htmlentities
), or they're inputting HTML markup, in which case you need to parse it properly, fixing broken markup and removing elements/attributes that aren't in a safe whitelist.
If that's what you want, use an existing library to do it (eg. htmlpurifier). Because it's not a trivial task and if you get it wrong you've given yourself XSS security holes.
You can keep specific tags using strip_tags
with this syntax: strip_tags($text, '<p><a>');
That snippet would strip all tags except p
and a
. Attributes are kept for tags you have allowed (p
and a
in the above example).
However, this doesn't mean that the attributes are safe. Does he want specific attributes or does he want to keep all of them on allowed tags? For the first case, you would need to parse each tag and remove the ones desired, sanitizing the values. To keep all attributes on allowed tags, you still need to sanitize them. I would recommend running htmlentities
on the attribute values to sanitize them (for display, I would assume).
精彩评论