开发者

Python - converting wide-char strings from a binary file to Python unicode strings

It's been a long day and I'm a bit stumped.

开发者_运维问答

I'm reading a binary file that contains lots of wide-char strings and I want to dump these out as Python unicode strings. (To unpack the non-string data I'm using the struct module, but I don't how to do the same with the strings.)

For example, reading the word "Series":

myfile = open("test.lei", "rb")
myfile.seek(44)
data = myfile.read(12)

# data is now 'S\x00e\x00r\x00i\x00e\x00s\x00'

How can I encode that raw wide-char data as a Python string?

Edit: I'm using Python 2.6


>>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00'
>>> data.decode('utf-16')
u'Series'


I also recommend to use rstrip with '\x00' after decode - to remove all '\x00' trailing characters, unless, of course, they are not needed.

>>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00'
>>> print '"%s"' % data.decode('utf-16').rstrip('\x00')
>>> "Some Data"

Without rstrip('\x00') the result will be with trailing spaces:

>>> "Some Data  "


If the string in question is known not to have any characters beyond FF, another possibility that generates a string rather than a unicode object, by eliding the zero-bytes:

>>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2]
'Series'


Hmm, why do you say "open" is preferrable to "file"? I see in the reference (python 2.5):

3.9 File Objects File objects are implemented using C's stdio package and can be created with the built-in constructor file() described in section 2.1, ``Built-in Functions.''3.6 ----- Footnote (3.6) file() is new in Python 2.2. The older built-in open() is an alias for file().

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜