Best way to store text with an undetermined code page in a MySQL database

2023-02-16 04:25 问答作者：

I am currently writing an application (App1) which retrieves portions of text remotely from another application (let's call it App2). There are several instances of App2 around the world, and they all interpret their strings according to their local system code page. App2 is not unicode-aware.

App1 retrieves the text from App2 without any hint as to the text's code page, but it is expected that at a latter 开发者_开发知识库point, a manual process will be undertaken to select the code page to correctly interpret the text.

Previous attempts to automatically determine the code page of the text have failed.

In the mean time, pending the manual determination, this data must be stored in a MySQL database.

What is the best way to store this data? Specifically, what CHARSET and COLLATION would be best employed here?

I believe that MySQL will not tolerate inserting characters into a field if they are not valid for the field's charset.

It would be ideal if I could detect the code page and convert the data to unicode before inserting into the database, but I am at a loss of how this can be done consistently and reliably.

If you really do not know the character set, then you can only store it as binary data. That will preserve all the contents (nothing gets mangled). When it comes to trying to use it as a text, you will have to guess the encoding.

What is the best way to store this data?

The only sane way is for App2 to send along the information what encoding the data is in.

Using that information, you could convert it to Unicode before inserting it into the database. That would be optimal.

All multi-byte libraries have functions to guess the encoding by looking at specific tell-tale byte values, but they are terribly unreliable, especially when the incoming data could have any encoding.

继续阅读：character-encoding codepages unicode

Best way to store text with an undetermined code page in a MySQL database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？