HTML form, character sets, and the accept-charset attribute
Is there a default character set used by HTML forms? Or is there a default accept-charset attribute that is used?
We're experiencing some problems with characters and character sets in our online forms.
The HTML pages are set to use the character set ISO-88开发者_StackOverflow59-1 (using a content
meta tag), but there is no specific accept-charset
attribute set in the forms.
The databases in the back end use UTF-8 encoding.
I'm not sure why there are two different character sets used here - that decision was a bit before my time, and can't be easily changed.
Most of the time, everything runs quite happily. The problem comes when someone enters a character that's not contained in the ISO-8859 character set - it displays correctly in the browser, but comes through to the back end as an unknown entity. Really bizarrely, it then transfers back to the browser correctly.
I've assumed so far that even if a user enters a character into the form that's not in the ISO-8859 charset, the page will use the character set from the meta
tag when sending the data to the server; causing the odd entity to be displayed in the database. Does this sound like a feasible explanation, and - if so - would changing the content type of the HTML pages be a reasonable solution to the problem?
Cheers.
Browsers will send the text from inputs in the same charset as the page is served. accept-charset
can cause problems, if you use it, make sure it has the same charset as your page.
The reason it's an unknown entity is because your database is treating it as UTF-8. But when it comes back to the page, it's just bytes, this time treated as ISO-8859.
However, it may cause problems if you are using any of your database's string functions on the text if it is treating it as UTF-8.
精彩评论