开发者

Unpack string with hexadecimals

I have a string that contains a float value in hexadecimal characters like this:

"\\64\\2e\\9b\\38"

I want to extract the float, but in order to do that I have to make Python see the string as 4 hex characters, instead of 16 regular characters. First I tried replacing the forward slashes, but I got an error:

>>开发者_StackOverflow社区;>> hexstring.replace("\\", "\x")
ValueError: invalid \x escape

I've discovered

struct.unpack("f", "\x64\x2e\x9b\x38") 

does exactly what I want, but how do I convert the string?


Whenever I see a (malformed) string, such as one composed of this list of characters:

['\\', '\\', '6', '4', '\\', '\\', '2', 'e', '\\', '\\', '9', 'b', '\\', '\\', '3', '8']

when what was intended was this list of characters

['\x64', '\x2e', '\x9b', '\x38']

I reach for the decode('string_escape') method.

But to use it, we need to replace the two characters r'\\' with r'\x'. You can use the replace(...) method for that.

In [37]: hexstring=r'\\64\\2e\\9b\\38'

In [38]: struct.unpack('f',(hexstring.replace(r'\\',r'\x').decode('string_escape')))
Out[38]: (7.3996168794110417e-05,)

In [39]: struct.unpack("f", "\x64\x2e\x9b\x38")
Out[39]: (7.3996168794110417e-05,)

PS. This use of the decode method works in Python2 but will not work in Python3. In Python3 codecs.decode is meant strictly for converting byte objects to string objects (err, what Python2 calls unicode objects), whereas in the example above, decode is actually converting a string object to a string object. Most decoding codecs in Python2 do convert string objects to unicode objects, but a few like 'string_escape' do not. In general they have been moved to other modules, or called in some other way.

In Python3, the equivalent of hexstring.decode('string_encode') is codecs.escape_decode(hexstring)[0].

Edit: Another way, similar in spirit to jsbueno's answer, is to use binascii.unhexlify:

In [76]: import binascii
In [81]: hexstring=r"\\64\\2e\\9b\\38"
In [82]: hexstring.replace('\\','')
Out[82]: '642e9b38'

In [83]: binascii.unhexlify(hexstring.replace('\\',''))
Out[83]: 'd.\x9b8'

These timeit results suggest binascii.unhexlify is the fastest:

In [84]: %timeit binascii.unhexlify(hexstring.replace('\\',''))
1000000 loops, best of 3: 1.42 us per loop

In [85]: %timeit hexstring.replace('\\','').decode('hex_codec')
100000 loops, best of 3: 2.94 us per loop

In [86]: %timeit hexstring.replace(r'\\',r'\x').decode('string_escape')
100000 loops, best of 3: 2.13 us per loop

Edit, per the comments:

This answer contains raw strings. The Department of Public Health advises that eating raw or undercooked strings poses a health risk to everyone, but especially to the elderly, young children under age 4, pregnant women and other highly susceptible individuals with compromised immune systems. Thorough cooking of raw strings reduces the risk of illness.


A shorter way to go ehre, is to just get rid of the "\" characters, and make python see each two hex-digits as a byte, using the "hex_codec":

struct.unpack("f", "\\64\\2e\\9b\\38".replace("\\", "\").decode("hex_codec"))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜