Browser codepage detection

2023-01-25 15:26 问答作者：

I have an ASP.Net page, where a user can enter some text in a TEXTAREA and submit it to the server. This text will be stored in a database and will be presented in a winform application.

How can I make sure that the winform application presents the exact characters that the user entered in the TEXTAREA.

That is, do I have a potential problem like for example if the user enters special language specific letters such as Æ, Ø and Å, wh开发者_StackOverflow中文版ich are Danish letters?

Those letters have different codes depending on the codepage, so as far as I can see, I need to know what codepage the TEXTAREA control is showing its input in. Or am i missing something here?

I have tried to find material on this on the net, but it is difficult to find something that addresses this issue. I typically found pages talking about what codepage the server requires the browser to use, in order to display the sent data correctly.

But my question goes the other way, i.e. from client to server.

You could also use the HEBCI: HTML Entity-Based Codepage Inference technique if you REALY want to be sure that users sending text with crappy browsers don't corrupt your data-backbone.

In essence this is how it works:

Every codepage has its own finger-print. For instance the single entity "º" could be used to distinguish between the Big Three: ISO-8859-1/Windows-1252 (=BA), MacRoman(=BC), and UTF-8 (=C2BA).

In a form you simply add a hidden input containing those fingerprints as entity's (like °, ÷, and —) and when the users submits the form you simply check the returned hex-values and compare them against your finger-print table. IF this does not give a match, only THEN continue other fall-back solutions.

A slightly larger implementation works great with only five codepoints:

my @fp_ents = qw/deg divide mdash bdquo euro/;
my %fingerprints = (
  "UTF-8" => ['c2b0','c3b7','e28094','e2809e','e282ac'],
  "WINDOWS-1252" => ['b0','f7','97','84','80'],
  "MAC"          => ['a1','d6','d1','e3','db'],
  "MS-HEBR"      => ['b0','ba','97','84','80'],
  "MAC-CYRILLIC" => ['a1','d6','d1','d7',''],
  "MS-GREEK"     => ['b0','','97','84','80'],
  "MAC-IS"       => ['a1','d6','d0','e3',''],
  "MS-CYRL"      => ['b0','','97','84','88'],
  "MS932"        => ['818b','8180','815c','',''],
  "WINDOWS-31J"  => ['818b','8180','815c','',''],
  "WINDOWS-936"  => ['a1e3','a1c2','a1aa','',''],
  "MS_KANJI"     => ['818b','8180','','',''],
  "ISO-8859-15"  => ['b0','f7','','','a4'],
  "ISO-8859-1"   => ['b0','f7','','',''],
  "CSIBM864"     => ['80','dd','','',''],
 );

You can look at the content-type header to find out the encoding.

For more details see this SO answer to a related question.

继续阅读：browser codepages

Browser codepage detection

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？