开发者

CPU Time needed to bulk upload 2 GB database?

I hired a programmer to port my web site -- originally implemented using Django and MySQL -- over to Google App Engine. The database for the original web app is about 2 GB in size, and the largest table has 5 million rows. To port these contents over, as I understand it, the programmer is serializing the database to JSON and then uploading it to Google app engine.

So far his uploading has used 100 hours of CPU time, as billed by GAE, yet it looks like only about 50 or 100 MB has been loaded into the database. Is that a reasonable amount of CPU time for such a small amount of data? MyS开发者_开发技巧QL could load this much data in a few minutes, so I don't understand why GAE would be 1000x slower. Is he doing something inefficiently?


That seems high, and it's likely they're making the server do a lot of work (decoding the JSON, encoding and storing the entities) that could be done on the client. There's already a bulkloader provided with the SDK, and if that isn't suitable for some reason, remote_api, on which the bulkloader is based, provides a more efficient option than rolling your own.


I have bulk loaded a GB of data, however i wrote my own bulk load module (based on the interfaces they defined), and it took 25 hours of CPU time.

For more info, you could take a look at App Engine Bulk Loader Performance


That depends a great deal on how he's serializing the data. I STRONGLY suspect that he's doing something inefficient as yes, that's ludicrous for that amount of data. Your inefficiency probably lies in the transfer time and the start/stop time for each query. If he's serializing each row and posting it to a handler one at a time then I could totally understand it both taking forever and consuming a lot of cpu time.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜