Python process keeps growing in django db upload script

2023-01-27 09:18 问答作者：

I'm running a conversion script that commits large amounts of data to a db using Django's ORM. I use manual commit to speed up the process. I have hundreds of files to to commit, each file will create more than a million objects.

I'm using Windows 7 64bit.开发者_开发知识库 I noticed the Python process keeps growing until it consumes more than 800MB, and this is only for the first file!

The script loops over records in a text file, reusing the same variables and without accumulating any lists or tuples.

I read here that this is a general problem for Python (and perhaps for any program), but I was hoping perhaps Django or Python has some explicit way to reduce the process size...

Here's an overview of the code:

import sys,os
sys.path.append(r'D:\MyProject')
os.environ['DJANGO_SETTINGS_MODULE']='my_project.settings'
from django.core.management import setup_environ
from convert_to_db import settings
from convert_to_db.convert.models import Model1, Model2, Model3
setup_environ(settings)
from django.db import transaction

@transaction.commit_manually
def process_file(filename):
    data_file = open(filename,'r')

    model1, created = Model1.objects.get_or_create([some condition])
    if created:
        option.save()

    while 1:
        line = data_file.readline()
        if line == '':
            break
        if not(input_row_i%5000):
            transaction.commit()
        line = line[:-1] # remove \n
        elements = line.split(',')

        d0 = elements[0]
        d1 = elements[1]
        d2 = elements[2]

        model2, created = Model2.objects.get_or_create([some condition])
        if created:
            option.save()

        model3 = Model3(d0=d0, d1=d1, d2=d2)
        model3 .save()

    data_file.close()
    transaction.commit()

# Some code that calls process_file() per file

First thing, make sure DEBUG=False in your settings.py. All queries sent to the database are stored in django.db.connection.queries when DEBUG=True. This will turn into a large amount of memory if you import many records. You can check it via the shell:

$ ./manage.py shell
> from django.conf import settings
> settings.DEBUG
True
> settings.DEBUG=False
> # django.db.connection.queries will now remain empty / []

If that does not help then try spawning a new Process to run process_file for each file. This is not the most efficient but you are trying to keep memory usage down not CPU cycles. Something like this should get you started:

from multiprocessing import Process

for filename in files_to_process:
    p = Process(target=process_file, args=(filename,))
    p.start()
    p.join()

It's difficult to say, what I would suggest is profile your code & see which section of your code is causing this memory surge.

After you know which part of the code is hogging memory you can think of reducing it.

Even after your efforts the memory consumption does not come down, you could do this - Since processes get memory allocation in chunks (or pages) & releasing them while the process is still running is difficult you could spawn a child process, do all your memory intensive tasks there & pass the results back to the parent process & die. This way the consumed memory (of child process) is returned back the the OS & your parent process stays lean...

继续阅读：django memory-management python windows

Python process keeps growing in django db upload script

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

Infinite gtk warnings when I right click on the icon

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？