What is the easiest way of deleting all my blobstore data?
What is your best way to remove all of the blob from blobstore? I'm using Python.
I have quite a lot of blobs and I'd like to delete them all. I'm currently doing the following:
class deleteBlobs(webapp.RequestHandler):
def get(self):
开发者_运维百科 all = blobstore.BlobInfo.all();
more = (all.count()>0)
blobstore.delete(all);
if more:
taskqueue.add(url='/deleteBlobs',method='GET');
Which seems to be using tons of CPU and (as far as I can tell) doing nothing useful.
I use this approach:
import datetime
import logging
import re
import urllib
from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template
from google.appengine.api import taskqueue
from google.appengine.api import users
class IndexHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('Hello. Blobstore is being purged.\n\n')
try:
query = blobstore.BlobInfo.all()
index = 0
to_delete = []
blobs = query.fetch(400)
if len(blobs) > 0:
for blob in blobs:
blob.delete()
index += 1
hour = datetime.datetime.now().time().hour
minute = datetime.datetime.now().time().minute
second = datetime.datetime.now().time().second
self.response.out.write(str(index) + ' items deleted at ' + str(hour) + ':' + str(minute) + ':' + str(second))
if index == 400:
self.redirect("/purge")
except Exception, e:
self.response.out.write('Error is: ' + repr(e) + '\n')
pass
APP = webapp.WSGIApplication(
[
('/purge', IndexHandler),
],
debug=True)
def main():
util.run_wsgi_app(APP)
if __name__ == '__main__':
main()
My experience is that more than 400 blobs at once will fail, so I let it reload for every 400. I tried blobstore.delete(query.fetch(400))
, but I think there's a bug right now. Nothing happened at all, and nothing was deleted.
You're passing the query object to the delete method, which will iterate over it fetching it in batches, then submit a single enormous delete. This is inefficient because it requires multiple fetches, and won't work if you have more results than you can fetch in the available time or with the available memory. The task will either complete once and not require chaining at all, or more likely, fail repeatedly, since it can't fetch every blob at once.
Also, calling count
executes the query just to determine the count, which is a waste of time since you're going to try fetching the results anyway.
Instead, you should fetch results in batches using fetch
, and delete each batch. Use cursors to set the next batch and avoid the need for the query to iterate over all the 'tombstoned' records before finding the first live one, and ideally, delete multiple batches per task, using a timer to determine when you should stop and chain the next task.
精彩评论