开发者

Batch existence check + save - Django

  • I have a CSV file listing items that I need to store within the database.

  • I need to check which items are not already stored, and if not stored I need to save them within the database.

  • There are 2-5 million rows.


开发者_开发百科

The model is Django's User model.

I have a CSV file of this form:

Item_ID, Surname, Policy_number, Sex, Title, Start_date

This is the code:

import csv

reader = csv.reader(open('items.csv', 'rb'))

for index, row in enumerate(reader):
    if User.objects.filter(username=row[2]).count():
        continue
    try:
        user = User(username=row[2],last_name=row[1],password='*')
        user.save()
    except Exception, e:
        print e
    del user
    del row
    del index

Any method you would recommend?


Depends on the situation. If the csv data can be converted to a model, something like this should do:

  • load the csv data
  • for each row:
    • check if a model for it exists
    • if not:
      • create a new model based on the data and save it.

Edit:
I think a batch check for existence will be hard. A batch save of the models would be quicker, but depending on the model complexity I think it's saver to just do it per model.


Try this. The count() is going to be VERY expensive.

for index, row in enumerate(reader):
    try:
        User.objects.get(username=row[2])
    except User.DoesNotExist:
        user = User(username=row[2],last_name=row[1],password='*')
        user.save()


You will want to load the CSV file, then go through each row using the get_object_or_create method to check and see if the object exists, if not then it will create it for you. If you show us the models.py code we may be able to help you out more specifically.


If your memory can handle the usernames variable, this might be a good optimisation.

import csv

reader = csv.reader(open('items.csv', 'rb'))

usernames = User.objects.values('username')

for index, row in enumerate(reader):
    if row[2] in usernames:
        continue
    User.objects.create(username=row[2],last_name=row[1],password='*')

If there really is a memory problem, you might take a look at for example this (existing) answer: Question about batch save objects in Django

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜