convert database field encoding in Jet Database / Delphi

2023-02-05 12:52 问答作者：

I have a legacy application written in Delphi which uses a Jet Database as its back-end for storing data and I need to export the data to a new format.

Opening the database with MS Access (Windows) or MDBViewer (Linux), in fields of type "MEMO" (mysql's TEXT equivalent) all I can see is garbage which resembles Asian characters. Running the legacy application the fields' contents show up correctly.

Is there a way I can try every possible character encoding and convert it to recover the data (I'm comfortable with PHP and C#开发者_如何学编程)? I read something about BOM (byte-order marker), that might be related, any ideas?

Thanks!

Current MS Access versions use UTF-8 to store string values. Older ones simply followed the code page of the machine on which the text was entered.

Most encodings do indeed use some marker bytes to indicate the encoding of what follows. Whether or not you have the benefit of that, really depends on the legacy app. If that simply followed a single encoding, or relied on the machine's code page, then you'll have to do some clever recognizing yourself.

Quick checks

UTF-8

If there is a marker, it would be $EFBBBF. If there isn't, you can make an educted guess that it is UTF-8 when sequences of ASCII (0-127) characters can be seen in the string.

UTF-16

Comes in two flavours: Little Endian (LE) and Big Endian (BE). For characters within the Basic Multilingual Plane, both use two bytes per character. The difference between the two is that for ASCII characters, one starts with a zero byte, the other ends with it.

If there is a marker UTF-16LE is designated by $FFFE and UTF-16BE by $FEFF. If neither of those markers is present having alternating zero and non-zero bytes in the memo field is a fair indication. And your first bet should be UTF-16LE as that is the windows standard and UTF-16BE is used a lot less. (Sorry, can never remember which of the two starts with a zero-byte for ASCII characters and which one starts with a non-zero byte).

Other

If you can exclude UTF-8 and UTF-16, you could try to figure out whether one of the other UTF encodings was used. I wouldn't spend the time though, chances are that the program simply relied on the machine's code page. Seeing as your are dealing with a lot of "asian looking" characters, your best bet would be to check for the MBCS code pages (Multi Byte Character S??? code pages). See MSDN for more details. As I have never dealt with them myself, I'm afraid I can't be of more help here though.

Trying encodings

If you do have to start trying out every encoding there is, you may want to have a look at the DIConvertors library. It's pretty good at converting between encodings. IIRC it can also recognize encodings, but otherwise it should help getting you started with your own detection. It can be found at http://www.yunqa.de/delphi/doku.php/products/converters/index

继续阅读：character-encoding delphi encoding jet ms-access

convert database field encoding in Jet Database / Delphi

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？