Reading BerkleyDB files from python: `\x01\x0b\x88\x0c\x01`?
This question in a nutshell: what do \x04++HLMh7EjP3ILSfF\x00
and
'\x01\x0b\x88\x0c\x01-\x10\x02\x06!\x05"\x05#\n$\x0c\'\x0e%\x0b\x01&\x02\'\x06(\n\x00\x00'
mean?
Hi all,
I'm trying to read palm pre 2 database files. Some information is available in the documentation, but not enough for me to clearly understand the format.
Using the file
command, I learnt that the format was objects.db: Berkeley DB (Btree, version 9, native byte-order)
.
Trying to op开发者_Go百科en the database directly using bsddb.open()
didn't work; I had to use
>>> env = bsddb.db.DBEnv()
>>> env.open(None, bsddb.db.DB_CREATE | bsddb.db.DB_INIT_MPOOL)
>>> internal_db = bsddb.db.DB(env)
>>> internal_db.open('C:\objects.db', 'objects.db', bsddb.db.DB_BTREE, bsddb.db.DB_RDONLY)
Now I've open the database, but the keys and values are encoded in a format which I don't understand: for example, here are some keys: '\x04++HMvu4v2GZbo1Ox\x00', '\x04++HMwBSPR8Zvwkt5\x00', '\x04++HMwF4OJ0R+WeSS\x00'
, and a value:
'\x01\x0b\xb7\r\x00\x05\xee\x89\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\x892\n:\r\x00\x05\xee\x89\'\x04sms\x003\n(\x04successful\x00.\rM\\/\x89\x00'
I tried to decode it from utf8, but I didn't get any convincing results. Do you recognize which encoding is being used? I don't understand the native byte-order
part of the output of the file
command, could it be related to this?
Thanks!
Educated guess
From simple inspection, The message is probably "sms successful", with a phone number:
unicode(s, errors='ignore') u'\x01\x0b\r\x00\x05\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\x10&\x04Ok\x00-\rM\/2\n:\r\x00\x05\'\x04sms\x003\n(\x04successful\x00.\rM\/\x00'
I think the other characters are binary data.
Encodings
Decoding it did not help - Both chardet and BeautifulSoup detects this as windows-1252
:
>>> s=u'\x01\x0b\xb7\r\x00\x05\xee\u2030\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\u20302\n:\r\x00\x05\xee\u2030\'\x04sms\x003\n(\x04successful\x00.\rM\\/\u2030\x00'
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> soup.originalEncoding
'windows-1252'
>>> chardet.detect(s)
{'confidence': 0.5, 'encoding': 'windows-1252'}
However, the 1252 decoding gives nothing meaningful:
>>> s.decode('windows-1252')
u'\x01\x0b\xb7\r\x00\x05\xee\u2030\x10\x029\x060\x04\x00/\x03\x04++HQqD0wWr_hZP75\x00\x00 \x02"\x06\x00$\x04inbox\x001\x02+\x04+33626320868\x00\x00%\x0e\x00\x00\x01.0\x19\xb3\x10&\x04Ok\x00-\rM\\/\u20302\n:\r\x00\x05\xee\u2030\'\x04sms\x003\n(\x04successful\x00.\rM\\/\u2030\x00'
Try using PalmDB or dbsql instead of bsddb
.
精彩评论