Is it a bad idea to escape HTML before inserting into a database instead of upon output?
I've been working on a system which doesn't allow HTML for开发者_如何学Pythonmatting. The method I currently use is to escape HTML entities before they get inserted into the database. I've been told that I should insert the raw text into the database, and escape HTML entities on output.
Other similar questions here I've seen look like for cases where HTML can still be used for formatting, so I'm asking for a case where HTML wouldn't be used at all.
you will also restrict yourself when performing the escaping before inserting into your db. let's say you decide to not use HTML as output, but JSON, plaintext, etc.
if you have stored escaped html in your db, you would first have to 'unescape' the value stored in the db, just to re-escape it again into a different format.
also see this perfect owasp article on xss prevention
Yes, because at some stage you'll want access to the original input entered. This is because...
- You never know how you want to display it - in JSON, in HTML, as an SMS?
- You may need to show it back to the user as is.
I do see your point about never wanting HTML entered. What are you using to strip HTML tags? If it a regex, then look out for confused users who might type something like this...
3<4 :->
They'll only get the 3
if it is a regex.
Suppose you have the text
R&B
, and store it asR&B
. If someone searches forR&B
, it won't match with a search SQL:SELECT * FROM table WHERE title LIKE ?
The same for equality, sorting, etc.
Or if someone searches for
life span
, it could return extraneous matches with the escaped<span>
's. Though this is a bit orthogonal, and can be solved by using an external service like Elasticsearch, or by storing a raw text version in another field; similar to what @limscoder suggested.If you expose the data via an API, the consumers may not expect the data to be escaped. Adding documentation may help.
A few months later, a new team member joins. As a well-trained developer, he always uses HTML escaping, now only to see everything is double-escaped (e.g. titles are showing up like
He said "nuff"
instead ofHe said "nuff"
).Some escaping functions have additional options. Forgetting to use the same functions/options while un-escaping could result in a different value than the original.
It's more likely to happen with multiple developers/consumers working on the same data.
I usually store both versions of the text. The escaped/formatted text is used when a normal page request is made to avoid the overhead of escaping/formatting every time. The original/raw text is used when a user needs to edit an existing entry, and the escaping/formatting only occurs when the text is created or changed. This strategy works great unless you have tight storage space constraints, since you will be duplicating data.
精彩评论