开发者

GAE: scheduled import of big gzipped file from a third party site

I'm working on a Python web app that needs to import big (in terms of GAE limits) gzipped files from a third party site on regular basis. Think of rdf exports DMOZ project is producing on regular intervals.

This means daily fetching of a 500+ MB gzip file, gunzipping, parsing, processing and storing the results in GAE's datastore for later use.

What's the proper way to implement this functionali开发者_如何学Pythonty on GAE, having in mind the limits for maximum download, processing time, etc?


The limit on downloaded file size in App Engine is currently 64MB. As a result, you've got two options:

  • Use HTTP Range headers to download and process the file in chunks.
  • Use an external service to do the download, split it into pieces, and send the pieces to your App Engine app.


My initial gut reaction (without knowing what's inside the gzipped file) is to do the processing somewhere else (AWS?) and then pushed the processed data to your GAE application in small bits.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜