How do I ensure that the text encoded in a form is utf8

2022-12-15 02:44 问答作者：

I have an html box with which users may enter text. I would like to ensure all text entered in the box is either encoded in UTF-8 or converted to UTF-8 when a user finishes typing. Furthermore, I don't quite understand how various UTF encoding are chosen when being entered into a text box.

Generally I'm curious about the following:

How does a browser determine which encodings to use when a user is typing into a text box?
How can javascript determine the encoding of a string value in an html text box?
Can I force the browser to only use UTF-8 encoding?
How can I encode arbitrary encodings to UTF-8 I assume there is a JavaScript library for this?

** Edit **

Removed some questions unnecessary to my goals.

This tutorial helped me understand JavaScript cha开发者_如何转开发racter codes better, but is buggy and does not actually translate character codes to utf-8 in all cases. http://www.webtoolkit.info/javascript-base64.html

How does a browser determine which encodings to use when a user is typing into a text box?

It uses the encoding the page was decoded as by default. According to the spec, you should be able to override this with the accept-charset attribute of the <form> element, but IE is buggy, so you shouldn't rely on this (I've seen several different sources describe several different bugs, and I don't have all the relevant versions of IE in front of me to test, so I'll leave it at that).

How can javascript determine the encoding of a string value in an html text box?

All strings in JavaScript are encoded in UTF-16. The browser will map everything into UTF-16 for JavaScript, and from UTF-16 into whatever the page is encoded in.

UTF-16 is an encoding that grew out of UCS-2. Originally, it was thought that 65,536 code points would be enough for all of Unicode, and so a 16 bit character encoding would be sufficient. It turned out that the is not the case, and so the character set was expanded to 1,114,112 code points. In order to maintain backwards compatibility, a few unused ranges of the 16 bit character set were set aside for surrogate pairs, in which two 16 bit code units were used to encode a single character. Read up on UTF-16 and UCS-2 on Wikipedia for details.

The upshot is that when you have a string str in JavaScript, str.length does not give you the number of characters, it gives you the number of code units, where two code units may be used to encode a single character, if that character is not within the Basic Multilingual Plane. For instance, "abc".length gives you 3, but "

继续阅读：encodingjavascriptutf-8


                            更多精彩内容
                            基于WinForm实现通用自动更新系统的完整流程
C#程序实现将MySQL的存储过程转换成Oracle的存储过程
SpringBoot打包为外部配置包的技巧分享
SpringBoot外部化配置的最佳实践指南
使用Python将CSV文件转换为PDF的实践指南

How do I ensure that the text encoded in a form is utf8

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？