How to unpack sqlite3 database written by Google AppEngine bulk downloader
I need to download all instances of fairly large (multi-GB) entity in my app's datastore. I have enough disk space to store the entity's data, but not enough to store both the original data that the bulk downloader retrieves as an SQLite database and the processed version of the data that the downloader writes after applying the transforms specified in my bulkloader.yaml file. Given this, I'm fairly certain that the bulk download operation would successfully retrieve the SQLite database, and then fail when trying to apply the transforms.
This might be okay since there's another system available to which I could move the SQLite database and where I could unpack it. (The other system that's available to me has Python installed but not a version that supports the AppEngine tools -- and I don't have permission to upgrade Python on that machine -- so I cannot do the bulk download directly there.) I could retrieve the data I need if I could write some Python code to load the SQLite database and read its result table, but I cannot figure out what to make of the SQLite data -- when I use the SQLite module to connect to the database and unpack rows of the table, they appear to contain metadata in addition to the data that I'm interested in (the data that my AppEngine app actually placed in the datastore).
I know that the appcfg.py bulk download process can read this data, since it can transform开发者_JS百科 the data in the ways I specify in bulkloader.yaml, but I haven't located the AppEngine toolkit code that does this unpacking. Any help or pointers would be appreciated.
Entities are stored in the downloaded SQLite database as encoded Protocol Buffers (the same as they're stored in the production environment, and everywhere else - an entity is an encoded PB, in short). You can read them out yourself by using the SDK code for decoding entities (db.proto_to_entity()
etc), but it'll be a bit of work to set everything up.
The relevant code is the ResultDatabase class in bulkloader.py - which you can probably reuse, along with other parts of the bulkloader, to make your job easier.
Here's the code that worked for me:
import sqlite3;
from google.appengine.datastore import entity_pb
from google.appengine.api import datastore
conn = sqlite3.connect('UserRecord.db', isolation_level=None)
cursor = conn.cursor()
cursor.execute('select id, value from result order by sort_key, id')
for unused_entity_id, entity in cursor:
entity_proto = entity_pb.EntityProto(contents=entity)
print datastore.Entity._FromPb(entity_proto)
精彩评论