开发者

What encoding does InstallShield expect non-latin-alphabet string table entries to use?

I work on an app that gets distributed via a single installer containing multiple localizations. The build process includes a script that updates the .ism string table with translations for each supported language.

This works fine for languages like French and German. But when testing the installer in, i.e. Japanese, the text shows up as a series of squares. It's unlikely to be a font problem, since the InstallShield-supplied strings show up fine; only the string table entries are mangled. So the problem seems to be that the strings are in the wrong encoding.

Th开发者_Go百科e .ism is in XML format, with UTF-8 declared as its encoding, so I assumed the strings needed to be UTF-8 encoded as well. Do they actually need to use the encoding of the target platform? Is there any concern, then, about targets having different encodings, i.e. Chinese systems using one GB-encoding versus another? What is the right thing to do here?

Edit: Using InstallShield 2009, since there is apparently a difference between that and 2010.


In InstallShield 2009 and earlier, the encoding is a base-64 encoding of the binary string in the ANSI encoding specific to the language in question (e.g. CP932 for Japanese). In InstallShield 2010 and later, it will still accept that or use UTF-8, depending on other columns in that table.


Thanks (up-voted his answer) go to Michael Urman, for pointing us in the right direction. But this is the actual working (with InstallShield 2009) algorithm, reverse-engineered by a co-worker:

  1. Start with a unicode (multi-byte-character) string
  2. Write out the length as the encoded-length field in the ism-file
  3. Encode the string as UTF-16-little-endian
  4. Base-64 using the uuencode dictionary, except with ` (back-tick) instead of spaces.
  5. Write the result to the ism-file, escaping XML entities

Be aware that base-64ing using the uuencode dictionary is not the same as using the uuencode algorithm. Standard uuencode produces a set of newline-separated lines, including a header, footers and one or more data lines, each of which begins with a length-character. If you're implementing this using a uuencode codec, you'll need to strip all of that off.


I'm also trying to figure this out...

I've inhereted some Installshield 12 (which is pre-2009) projects with string table entries containing characters outside the range of base64 'target' characters.

For example, one of the Japanese strings is: 4P!H&$9!O'<4!R&\=!E&,=``@$(80!C&L=0!P"00!G`&4`;@!T`)(PI##S,+DPR##\,.LP5S!^,%DP`C

After much searching I happened upon Base85 encoding, which looks much closer to being plausible, but have not yet verified this to be the solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜