开发者

How to translate double-UTF-8-decoder code in Python to Lua

I have this legacy code snippet, which (apparently) decodes double-encoded UTF-8 text back to normal UTF-8:

# Run with python3!
import codecs
import sys
s=codecs.open('doubleutf8.dat', 'r', 'utf-8').read()
sys.stdout.write(
                s
                .encode('raw_unicode开发者_JAVA技巧_escape')
                .decode('utf-8')
        )

I need to translate it to Lua, and imitate all possible decoding side-effects (if any).

Limitations: I may use any of available Lua modules for UTF-8 handling, but preferably the stable one, with LuaRocks support. I will not use Lupa or other Lua-Python bridging solution, neither will I call os.execute() to invoke Python.


You can use lua-iconv, the Lua binding to the iconv library. With it you can convert between character encodings as much as you like.

It is also available in LuaRocks.

Edit: using this answer I have been able to correctly decode the data using the following Lua code:

require 'iconv'
-- convert from utf8 to latin1
local decoder = iconv.new('latin1', 'utf8')
local data = io.open('doubleutf8.dat'):read('*a')
-- decodedData is encoded in utf8
local decodedData = decoder:iconv(data)
-- if your terminal understands utf8, prints "нижний новгород"
-- if not, you can further convert it from utf8 to any encoding, like KOI8-R
print(decodedData)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜