django: registering unzipped files on the local disk
[I apologize in advance for the length of this question.]
I'm using Django 1.2.3-3+squeeze1 on Debian squeeze.
I am writing an application which uploads zip files to disk in a
temporary location, unzips them, and then saves the results to a
permanent location. The unzipped files are registered in the database as a class called
FileUpload
after they are unzipped. The uploaded zipped files also correspond to a class,
but I'll ignore that for the purposes of this question. FileUpload
looks like this.
class FileUpload(models.Model):
folder = models.ForeignKey(FolderUpload, null=True, blank=True, related_name='parentfolder')
upload_date = models.DateTimeField(default=datetime.now(), blank=True, editable=False)
upload = models.FileField(upload_to=file_upload_path)
name = models.CharField(max_length=100)
description = models.CharField(blank=True, max_le开发者_StackOverflow中文版ngth=200)
def save(self):
if not self.id:
if self.folder == None:
pass
else:
self.path = self.folder.path
super(FileUpload, self).save()
I'm also using a form defined by
from django.forms import ChoiceField, Form, ModelForm
class FileUploadForm(ModelForm):
class Meta:
model = FileUpload
The function that takes the unzipped files on the disk. registers them
with the database, and moves them to the correct place is called
addFile
. I was previously using this:
def addFile(name, filename, description, content, folder_id=None):
#f = open(filename, 'r')
#content = f.read()
from forms import FileUploadForm
from django.core.files.uploadedfile import SimpleUploadedFile
if folder_id == None:
data = {'name':name, 'description':description}
else:
data = {'name':name, 'description':description, 'folder':str(folder_id)}
file_data = {'upload': SimpleUploadedFile(filename, content)}
ff = FileUploadForm(data, file_data)
try:
zf = ff.save(commit=False)
zf.save()
except:
raise RuntimeError, "Form error is %s."%(ff.errors)
return zf
This worked, but the problem was that it dumped the entire file into memory. With large files, and especially given Python isn't known for it's memory economy, this consumed huge amounts of memory. So I switched to this:
from django.core.files.uploadedfile import UploadedFile
class UnzippedFile(UploadedFile):
def __init__(self, file, filepath, content_type='text/plain', charset=None):
import os
self.filepath = filepath
self.name = os.path.basename(filepath)
self.size = os.path.getsize(filepath)
self.content_type = content_type
self.charset = charset
super(UnzippedFile, self).__init__(file, self.name, content_type, self.size, charset)
def temporary_file_path(self):
"""
Returns the full path of this file.
"""
return self.filepath
def addFile(filepath, description, file, folder_id=None):
import os, sys
from forms import FileUploadForm
from django.core.files.uploadedfile import UploadedFile
name = os.path.basename(filepath)
if folder_id == None:
data = {'name':name, 'description':description}
else:
data = {'name':name, 'description':description, 'folder':str(folder_id)}
file_data = {'upload': UnzippedFile(file, filepath)}
ff = FileUploadForm(data, file_data)
try:
zf = ff.save(commit=False)
zf.save()
except:
raise
return zf
I was forced to subclass UploadedFile
, since none of the derived
classes that were already there (in
django/core/files/uploadedfile.py
) seemed to do what I wanted.
The temporary_file_path
function is there because the Django File Uploads docs say
UploadedFile.temporary_file_path()
Only files uploaded onto disk will have this method; it returns the full path to the temporary uploaded file.
It seems the FileSystemStorage
class looks for this attribute in
_save
function as described later.
If n
is the relative path of the file in the zip archive, then the
usage is
name = os.path.normpath(os.path.basename(n)) # name of file
pathname = os.path.join(dirname, n) # full file path
description = "from zip file '" + zipfilename + "'" # `zipfilename` is the name of the zip file
fname = open(pathname) # file handle
f = addFile(pathname, description, fname)
This works, but I traced through the code, and found that the code was
using streaming, when clearly the optimal thing to do in this case
would be to just copy the file from the temporary location to the
permanent location. The code in question is in
django/core/files/storage.py
, in the _save
function of the
FileSystemStorage
class. In _save
, name
is the relative path of
the destination, and content
is a File
object.
def _save(self, name, content):
full_path = self.path(name)
directory = os.path.dirname(full_path)
[...]
while True:
try:
# This file has a file path that we can move.
if hasattr(content, 'temporary_file_path'):
file_move_safe(content.temporary_file_path(), full_path)
content.close()
# This is a normal uploadedfile that we can stream.
else:
# This fun binary flag incantation makes os.open throw an
# OSError if the file already exists before we open it.
fd = os.open(full_path, os.O_WRONLY | os.O_CREAT | os.O_EXCL | getattr(os, 'O_BINARY', 0))
try:
locks.lock(fd, locks.LOCK_EX)
for chunk in content.chunks():
os.write(fd, chunk)
finally:
locks.unlock(fd)
os.close(fd)
except OSError, e:
if e.errno == errno.EEXIST:
# Ooops, the file exists. We need a new file name.
name = self.get_available_name(name)
full_path = self.path(name)
else:
raise
else:
# OK, the file save worked. Break out of the loop.
break
The _save
function is looking for the attribute
temporary_file_path
. I believe this code is intended to be triggered
by the temporary_file_path
function mentioned earlier in
django/core/files/uploadedfile.py
. However, the class that is
actually passed (corresponding to the content
argument) is <class
'django.db.models.fields.files.FieldFile'>
, and here is what the
attribute dict (content.__dict__
) for this object looks like:
{'_committed': False, 'name': u'foo', 'instance': <FileUpload: foo>,
'_file': <UnzippedFile: foo (text/plain)>, 'storage':<django.core.files.storage.DefaultStorage object at 0x9a70ccc>,
'field': <django.db.models.fields.files.FileField object at0x9ce9b4c>, 'mode': None}
The temporary_file_path
is attached to the
UnzippedFile
class, which is inside the _file
data member. So
content._file
has a temporary_file_path
attribute, not content
itself.
This is what a regular file upload looks like. As you can see, it is similar.
[Fri Jun 17 08:05:33 2011] [error] type of content is <class 'django.db.models.fields.files.FieldFile'>
[Fri Jun 17 08:05:33 2011] [error] {'_committed': False, 'name': u'behavior.py',
'instance': <FileUpload: b>, '_file': <TemporaryUploadedFile: behavior.py (text/x-python)>,
'storage': <django.core.files.storage.DefaultStorage object at 0xb8d7fd8c>,
'field': <django.db.models.fields.files.FileField object at 0xb8eb584c>, 'mode': None}
It is difficult for me to follow in any detail how the code gets from
the FileUploadForm
save
to the Storage
object. The Django form
code in particular is quite obscure.
Anyway, my question, after all this setup is, how/when is the first
option below, with file_move_safe
supposed to be activated? I'm
seeing a mismatch here. Is this a bug? Can anyone clarify?
if hasattr(content, 'temporary_file_path')
The above will never equal true with this conditional since you state that content does not have a temporary_file_path identifier. However since content._file does you can use the following to get the functionality you are looking for
if hasattr(content, '_file'):
if hasattr(content._file,'temporary_file_path'):
精彩评论