Filtering user input - clarification needed
I would like to clarify what is the proper way to filter user input with php. For example I have a web form that a user enters information into. When submitted the data from the form will be entered into a database.
My understanding is you don't want to sanitize the data going into the database, except for escaping it such as mysql_escape_string, you want to sanitize it when displaying it on the front end with something like htmlentities or htmlspecialcha开发者_如何学Pythonrs. However if you want you can validate/filter the user input when they submit the form to make sure the data is in the proper format such as if a field is for an email address you want to validate that it has the proper email format. Is that correct?
My next question is what do you do with the data when you re-display it in a web form? Lets say the user is allowed to edit the information in that form after they filled it out and the information was added to the database. They then go back in and see the data in the fields they originally entered, do you have to sanitize the data for it to show correctly in the form fields? For example there is a field called My Title, the person enters My title is "Manager". You see the quotations around manager, when you display it as is into the form field it breaks because of the quotations:
<input type="text" name="title" value="My title is "Manager"">
So don't you have to do something like htmlentities to turn the quotations into its html entities? Otherwise the value of the field would look like My title is
Hope this makes sense.
Nothing says you can't sanitize data before database insertion. After all, if your script/site/company has a certain policy regarding what's acceptable in a form field, it's best to strip out anything that's not allowed before saving it. That way you only sanitize once, before data insertion/update, rather than EVERY TIME you retrieve the data.
If you allow HTML entities for (say) accented characters, but not HTML tags, then you have to both check for invalid entities (&foobar;
?) and HTML tags as well. Since you don't allow them, don't bother storing them. If you require a valid email address, then check if it's at RFC 5322 compliant and only store it once the user's entered proper data. (Whether that email address actually exists is another matter).
Now, let's get one thing straight. There's a difference between sanitization and escaping. Sanitization means literally to clean up - you're removing anything you don't want from the data. You can either silently drop it, or present an error to the user and tell them to fix it. On the other hand, escaping is just a means of encoding data so it's displayed properly.
With your My title is "Manager"
string, you don't need to sanitize it, as there's nothing really wrong or offensive about it. What you do need to do is escape it, with at least htmlspecialchars()
, so that the embedded double quotes don't "break" your form. If you embed it verbatim, most browsers will see it as having value="My title is"
and some bogus attribute/garbage Manager""
. So, you run it through htmlspecialchars and end up My title is "Manager"
, which embeds into the value=""
perfectly with no trouble. No sanitization, just proper encoding.
Now, when that form is submitted, then you do have to sanitize/validate again, as the data's been in the hands of a potentially malicious user, and the data could have been changed to My title is <script>document.location='http://attacksite.com';</script>pwn me
.
Basically, the workflow should be:
- present form to user
- get data submitted.
- sanitize data
- if form is not correctly filled out, displays errors and go to 1)
- escape data for sql query
- insert into database
then later
- retrieve data from database
- escape/encode as appropriate for however it will be displayed
- display data. if data's going into a form, do 1-6 as before.
精彩评论