开发者

django: registering unzipped files on the local disk

[I apologize in advance for the length of this question.]

I'm using Django 1.2.3-3+squeeze1 on Debian squeeze.

I am writing an application which uploads zip files to disk in a temporary location, unzips them, and then saves the results to a permanent location. The unzipped files are registered in the database as a class called FileUpload after they are unzipped. The uploaded zipped files also correspond to a class, but I'll ignore that for the purposes of this question. FileUpload looks like this.

class FileUpload(models.Model):
    folder = models.ForeignKey(FolderUpload, null=True, blank=True, related_name='parentfolder')
    upload_date = models.DateTimeField(default=datetime.now(), blank=True, editable=False)
    upload = models.FileField(upload_to=file_upload_path)
    name = models.CharField(max_length=100)
    description = models.CharField(blank=True, max_le开发者_StackOverflow中文版ngth=200)

    def save(self):
        if not self.id:
            if self.folder == None:
                pass
            else:
                self.path = self.folder.path
        super(FileUpload, self).save()

I'm also using a form defined by

from django.forms import ChoiceField, Form, ModelForm

class FileUploadForm(ModelForm):
    class Meta:
        model = FileUpload

The function that takes the unzipped files on the disk. registers them with the database, and moves them to the correct place is called addFile. I was previously using this:

def addFile(name, filename, description, content, folder_id=None):
    #f = open(filename, 'r')
    #content = f.read()
    from forms import FileUploadForm
    from django.core.files.uploadedfile import SimpleUploadedFile
    if folder_id == None:
        data = {'name':name, 'description':description}
    else:
        data = {'name':name, 'description':description, 'folder':str(folder_id)}
    file_data = {'upload': SimpleUploadedFile(filename, content)}
    ff = FileUploadForm(data, file_data)
    try:
        zf = ff.save(commit=False)
        zf.save()
    except:
        raise RuntimeError, "Form error is %s."%(ff.errors)
    return zf

This worked, but the problem was that it dumped the entire file into memory. With large files, and especially given Python isn't known for it's memory economy, this consumed huge amounts of memory. So I switched to this:

from django.core.files.uploadedfile import UploadedFile

class UnzippedFile(UploadedFile):

    def __init__(self, file, filepath, content_type='text/plain', charset=None):
        import os
        self.filepath = filepath
        self.name = os.path.basename(filepath)
        self.size = os.path.getsize(filepath)
        self.content_type = content_type
        self.charset = charset
        super(UnzippedFile, self).__init__(file, self.name, content_type, self.size, charset)

    def temporary_file_path(self):
        """
        Returns the full path of this file.
        """
        return self.filepath

def addFile(filepath, description, file, folder_id=None):
    import os, sys
    from forms import FileUploadForm
    from django.core.files.uploadedfile import UploadedFile
    name = os.path.basename(filepath)
    if folder_id == None:
        data = {'name':name, 'description':description}
    else:
        data = {'name':name, 'description':description, 'folder':str(folder_id)}
    file_data = {'upload': UnzippedFile(file, filepath)}
    ff = FileUploadForm(data, file_data)
    try:
        zf = ff.save(commit=False)
        zf.save()
    except:
        raise
    return zf

I was forced to subclass UploadedFile, since none of the derived classes that were already there (in django/core/files/uploadedfile.py) seemed to do what I wanted.

The temporary_file_path function is there because the Django File Uploads docs say

UploadedFile.temporary_file_path()

Only files uploaded onto disk will have this method; it returns the full path to the temporary uploaded file.

It seems the FileSystemStorage class looks for this attribute in _save function as described later.

If n is the relative path of the file in the zip archive, then the usage is

name = os.path.normpath(os.path.basename(n)) # name of file
pathname = os.path.join(dirname, n) # full file path
description = "from zip file '" + zipfilename + "'" # `zipfilename` is the name of the zip file
fname = open(pathname) # file handle
f = addFile(pathname, description, fname)

This works, but I traced through the code, and found that the code was using streaming, when clearly the optimal thing to do in this case would be to just copy the file from the temporary location to the permanent location. The code in question is in django/core/files/storage.py, in the _save function of the FileSystemStorage class. In _save, name is the relative path of the destination, and content is a File object.

def _save(self, name, content):
    full_path = self.path(name)
     directory = os.path.dirname(full_path)
     [...]
    while True:
            try:
                # This file has a file path that we can move.
                if hasattr(content, 'temporary_file_path'):
                    file_move_safe(content.temporary_file_path(), full_path)
                    content.close()

                # This is a normal uploadedfile that we can stream.
                else:
                    # This fun binary flag incantation makes os.open throw an
                    # OSError if the file already exists before we open it.
                    fd = os.open(full_path, os.O_WRONLY | os.O_CREAT | os.O_EXCL | getattr(os, 'O_BINARY', 0))
                    try:
                        locks.lock(fd, locks.LOCK_EX)
                        for chunk in content.chunks():
                            os.write(fd, chunk)
                    finally:
                        locks.unlock(fd)
                        os.close(fd)
            except OSError, e:
                if e.errno == errno.EEXIST:
                    # Ooops, the file exists. We need a new file name.
                    name = self.get_available_name(name)
                    full_path = self.path(name)
                else:
                    raise
            else:
                # OK, the file save worked. Break out of the loop.
                break

The _save function is looking for the attribute temporary_file_path. I believe this code is intended to be triggered by the temporary_file_path function mentioned earlier in django/core/files/uploadedfile.py. However, the class that is actually passed (corresponding to the content argument) is <class 'django.db.models.fields.files.FieldFile'>, and here is what the attribute dict (content.__dict__) for this object looks like:

{'_committed': False, 'name': u'foo', 'instance': <FileUpload: foo>,
'_file': <UnzippedFile: foo (text/plain)>, 'storage':<django.core.files.storage.DefaultStorage object at 0x9a70ccc>,
'field': <django.db.models.fields.files.FileField object at0x9ce9b4c>, 'mode': None}

The temporary_file_path is attached to the UnzippedFile class, which is inside the _file data member. So content._file has a temporary_file_path attribute, not content itself.

This is what a regular file upload looks like. As you can see, it is similar.

[Fri Jun 17 08:05:33 2011] [error] type of content is <class 'django.db.models.fields.files.FieldFile'>
[Fri Jun 17 08:05:33 2011] [error] {'_committed': False, 'name': u'behavior.py',
                                    'instance': <FileUpload: b>, '_file': <TemporaryUploadedFile: behavior.py (text/x-python)>,
                                    'storage': <django.core.files.storage.DefaultStorage object at 0xb8d7fd8c>,
                                    'field': <django.db.models.fields.files.FileField object at 0xb8eb584c>, 'mode': None}

It is difficult for me to follow in any detail how the code gets from the FileUploadForm save to the Storage object. The Django form code in particular is quite obscure.

Anyway, my question, after all this setup is, how/when is the first option below, with file_move_safe supposed to be activated? I'm seeing a mismatch here. Is this a bug? Can anyone clarify?


if hasattr(content, 'temporary_file_path') 

The above will never equal true with this conditional since you state that content does not have a temporary_file_path identifier. However since content._file does you can use the following to get the functionality you are looking for

if hasattr(content, '_file'):
   if hasattr(content._file,'temporary_file_path'):
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜