开发者

Reading BerkleyDB files from python: `\x01\x0b\x88\x0c\x01`?

This question in a nutshell: what do \x04++HLMh7EjP3ILSfF\x00 and

'\x01\x0b\x88\x0c\x01-\x10\x02\x06!\x05"\x05#\n$\x0c\'\x0e%\x0b\x01&\x02\'\x06(\n\x00\x00'

mean?

Hi all,

I'm trying to read palm pre 2 database files. Some information is available in the documentation, but not enough for me to clearly understand the format.

Using the file command, I learnt that the format was objects.db: Berkeley DB (Btree, version 9, native byte-order).

Trying to op开发者_Go百科en the database directly using bsddb.open() didn't work; I had to use

>>> env = bsddb.db.DBEnv()
>>> env.open(None, bsddb.db.DB_CREATE | bsddb.db.DB_INIT_MPOOL)
>>> internal_db = bsddb.db.DB(env)
>>> internal_db.open('C:\objects.db', 'objects.db', bsddb.db.DB_BTREE, bsddb.db.DB_RDONLY)

Now I've open the database, but the keys and values are encoded in a format which I don't understand: for example, here are some keys: '\x04++HMvu4v2GZbo1Ox\x00', '\x04++HMwBSPR8Zvwkt5\x00', '\x04++HMwF4OJ0R+WeSS\x00', and a value:

'\x01\x0b\xb7\r\x00\x05\xee\x89\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\x892\n:\r\x00\x05\xee\x89\'\x04sms\x003\n(\x04successful\x00.\rM\\/\x89\x00'

I tried to decode it from utf8, but I didn't get any convincing results. Do you recognize which encoding is being used? I don't understand the native byte-order part of the output of the file command, could it be related to this?

Thanks!


Educated guess

From simple inspection, The message is probably "sms successful", with a phone number:

unicode(s, errors='ignore') u'\x01\x0b\r\x00\x05\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\x10&\x04Ok\x00-\rM\/2\n:\r\x00\x05\'\x04sms\x003\n(\x04successful\x00.\rM\/\x00'

I think the other characters are binary data.

Encodings

Decoding it did not help - Both chardet and BeautifulSoup detects this as windows-1252:

>>> s=u'\x01\x0b\xb7\r\x00\x05\xee\u2030\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\u20302\n:\r\x00\x05\xee\u2030\'\x04sms\x003\n(\x04successful\x00.\rM\\/\u2030\x00'
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> soup.originalEncoding
'windows-1252'
>>> chardet.detect(s)
{'confidence': 0.5, 'encoding': 'windows-1252'}

However, the 1252 decoding gives nothing meaningful:

>>> s.decode('windows-1252')
u'\x01\x0b\xb7\r\x00\x05\xee\u2030\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\u20302\n:\r\x00\x05\xee\u2030\'\x04sms\x003\n(\x04successful\x00.\rM\\/\u2030\x00'


Try using PalmDB or dbsql instead of bsddb.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜