开发者

Similar C string format in Python

I need to read a file with some strange string lines like : \x72\xFE\x20TEST_STRING\0\0\0

but when I do a print of this string (with repr()) it prints this : r\xfe TEST_STRING\x00\x00\x00

Example :

>>> test = '\x72\xFE\x20TEST_STRING\0\0\0'
>>> print test
r? TEST_STRING
>>> print repr(test)
'r\xfe TEST_STRING\x00\x00\x00'

How can I开发者_如何学编程 get the same line from a file in Python and my editor ? Is python changing encoding during string manipulation ?


You should use python's raw strings, like this (note the 'r' in front of the string)

test = r'\x72\xFE\x20TEST_STRING\0\0\0'

Then it won't try to interpret the escapes as special characters.

When reading from a text file python shouldn't be trying to interpret the string as having multi-byte unicode characters. You should get a exactly what's in the file:

In [22]: fp = open("test.txt", "r")

In [23]: s = fp.read()

In [24]: s
Out[24]: '\\x72\\xFE\\x20TEST_STRING\\0\\0\\0\n\n'

In [25]: print s
\x72\xFE\x20TEST_STRING\0\0\0


\x20 is a space. When you put that into a Python string it is stored exactly the same way as a space.

If you have printable characters in a string it does not matter whether they were typed as the actual character or some escape sequence, they will be represented the same way because they are in fact the same value.

Consider the following examples:

>>> ' ' == '\x20'
True

>>> hex(ord('a'))
'0x61'
>>> '\x61'
'a'


Python did not change the encoding:

When printing Python just resolved the printable chars in your string: chr(0x72) is a "r", chr(0xfe) is not printable, so you get the "?", chr(0x20) is chr(32) that is a space " ", and zero bytes are not printed at all.

repr() resolves the "r", leaves the chr(0xfe), and prints the chr(0) in full hexadecimal notation for chr(0x00).

So if you want the same line in your editor and for repr(), you have to type your string in your editor in the same notation repr() does, that is you write

test='r\xfe TEST_STRING\x00\x00\x00'

and repr(test) should print the same string:


To avoid having python interpret the backslashes as escaped characters, prefix your string with an "r" character:

    >>> test = r'\x72\xFE\x20TEST_STRING\0\0\0'
    >>> print test
    \x72\xFE\x20TEST_STRING\0\0\0`
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜