Best way to get CSV data into App Engine when bulkloader takes too long/generates errors?

2023-02-27 18:57 问答作者：

I have a 10 MB CSV file of Geolocation data that I tried to upload to my App Engine datastore yesterday. I followed the instructions in this blog post and used the bulkloader/appcfg tool. The datastore indicated that records were uploaded but it took several hours and used up my entire CPU quota for the day. The process broke开发者_开发百科 down in errors towards the end before I actually exceeded my quota. But needless to say, 10 MB of data shouldn't require this much time and power.

So, is there some other way to get this CSV data into my App Engine datastore (for a Java app).

I saw a post by Ikai Lan about using a mapper tool he created for this purpose but it looks rather complicated.

Instead, what about uploading the CSV to Google Docs - is there a way to transfer it to the App Engine datastore from there?

I do daily uploads of 100000 records (20 megs) through the bulkloader. Settings I played with: - bulkloader.yaml config: set to auto generate keys. - include header row in raw csv file. - speed parameters are set on max (not sure if reducing would reduce cpus consumed)

These settings burn through my 6.5 hrs of free quota in about 4 minutes -- but it gets the data loaded (maybe its' from the indexes being generated).

appcfg.py upload_data --config_file=bulkloader.yaml  --url=http://yourapp.appspot.com/remote_api --filename=data.csv --kind=yourtablename --bandwidth_limit=999999 --rps_limit=100 --batch_size=50 --http_limit=15

(I autogenerate this line with a script and use Autohotkey to send my credentials).

I wrote this gdata connector to pull data out of a Google Docs Spreadsheet and insert it into the datastore, but it uses Bulkloader, so it kind of takes you back to square one of your problem.

http://code.google.com/p/bulkloader-gdata-connector/source/browse/gdata_connector.py

What you could do however is take a look at the source to see how I pull data out of gdocs and create a task(s) that does that, instead of going through bulkloader.

Also you could upload your document into the blobstore and similarly create a task that reads csv data out of blobstore and creates entities. (I think this would be easier and faster than working with gdata feeds)

继续阅读：bulkloader csv

Best way to get CSV data into App Engine when bulkloader takes too long/generates errors?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？