开发者

ISO encoding with Japanese Frame file

I have a Japanese content which is being converted to MS help with a certain tool. The problem is that the third party tool isn't using utf-8 encoding and is creating a .xml with garbage characters:

    <param name="Name" value="&#195;&#137;A&#195;&#137;v&#195;&#137;&#195;&#164;&#195;&#137;P&#195;&#133;&#195;&#137;V&#195;&#137;&#195;&#161;&#195;&#137;&#195;&#172;&#195;&#135;&#8224;&#195;&#135;'&#195;&#135;&#195;&#139;&#195;&#135;&#195;&#152;&#195;&#133;&#501;&#195;&#135;&#195;&#039;&#195;&#135;&#195;&#039;]">
    <param name="Name" value="Test File">
    <param name="Local" value="applications.htm#Xau1044547">

I tried playing around with the encoding and it now produces:

    <param name="Name" value="ÉAÉvÉäÉPÅ">
    <param name="Name" value="Test">
    <param name="Local" value="applications.htm#Xau1044547">

But with utf-8 encoding (another tool) and the correct output should be:

    <param name="Name" value="アプリケーション">
    <param name="Name" value="Small Business アプリケーションの起動 ">
    <param name="Local" value="applications1.html#wp1044548">

Is there any java 开发者_如何学GoAPI I can use to decode and encode the files to have the correct output. I am not sure what the tool is using but I am guessing its "ISO-8859-1".

Thanks.


Your problem is that you need to use two encodings correctly:

  • Find out what encoding your "Japanese content" uses
  • Make sure the tool correctly uses that encoding to read that content
  • Make sure the tool uses UTF-8 to encode the output file and correctly declares that in its header.


It would appear from the upper-most sample that your encoding at that point is already corrupt. The value for the first "Name" attribute it being represented with HTML character escape codes (decimal NCR).

That being said, the 2nd samples (value="ÉAÉvÉäÉPÅ") and 3rd samples (value="アプリケーション") do not match the 1st.

If HTML character escapes are indeed what the output should be, then the output encoding would be ASCII or some other variant, and the value would then be:

value="&#12450;&#12503;&#12522;&#12464;&#12540;&#12471;&#12519;&#12531;"

I think you would need to reconfirm how this 3rd party tool is outputting the XML.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜