What is the best way to work with large data sets in the Google App Engine Java Development server?
I am developing a Java application using Google App Engine that depends on a largish dataset to be present. Without getting into specifics of my app开发者_如何转开发lication, I'll just state that working with a small subset of the data is simply not practical. Unfortunately, at the time of this writing, the Google App Engine for Java development server stores the entire datastore in memory. According to Ikai Lan:
The development server datastore stub is an in memory Map that is persisted to disk.
I simply cannot import my entire dataset into the development datastore without running into memory problems. Once the application is pushed into Google's cloud and uses BigTable, there is no issue. But deployment to the cloud takes a long time making development cycles kind of painful. So developing this way is not practical.
I've noticed the Google App Engine for Python development server has an option to use SQLite as the backend datastore which I presume would solve my problem.
dev_appserver.py --use_sqlite
But the Java development server includes no such option (at least not documented). What is the best way to get a large dataset working with the Google App Engine Java development server?
There's no magic solution - the only datastore stub for the Java API, currently, is an in-memory one. Short of implementing your own disk-based stub, your only options are to find a way to work with a subset of data for testing, or do your development on appspot.
I've been using the mapper api to import data from the blobstore, as described by Ikai Lan in this blog entry - http://ikaisays.com/2010/08/11/using-the-app-engine-mapper-for-bulk-data-import/.
I've found it to be much faster and more stable than using the remote api bulkloader - especially when loading medium sized datasets (100k entities) into the local datastore.
精彩评论