How to limit file size when writing one?
I am using the output streams from the io modu开发者_如何学JAVAle and writing to files. I want to be able to detect when I have written 1G of data to a file and then start writing to a second file. I can't seem to figure out how to determine how much data I have written to the file.
Is there something easy built in to io
? Or might I have to count the bytes before each write manually?
if you are using this file for a logging purpose i suggest using the RotatingFileHandler in logging module like this:
import logging
import logging.handlers
file_name = 'test.log'
test_logger = logging.getLogger('Test')
handler = logging.handlers.RotatingFileHandler(file_name, maxBytes=10**9)
test_logger.addHandler(handler)
N.B: you can also use this method even if you don't use it for logging if you like doing hacks :)
See the Python documentation for File Objects, specifically tell().
Example:
>>> f=open('test.txt','w')
>>> f.write(10*'a')
>>> f.tell()
10L
>>> f.write(100*'a')
>>> f.tell()
110L
See the tell() method on the stream object.
One fairly straight-forward approach is to subclass the builtinfile
class and have it keep track of the amount of output which is written to the file. Below is a some sample code showing how that might be done which appears to mostly work.
I say mostly because the size of the files produced is sometimes slightly over the maximum while testing it, but that's because the test the file was opened in "text" mode and on Windows this means that all the'\n'
linefeed characters get converted into'\r\n'
(carriage-return, linefeed) pairs, which throws the size accumulator off. Also, as currently written, thebufsize
argument that the standardfile()
andopen()
functions accept is not supported, so the system's default size and mode will always be used.
Depending on exactly what you're doing, the size issue may not be big problem -- however for large maximum sizes it might be off significantly. If anyone has a good platform-independent fix for this, by all means let us know.
import os.path
verbose = False
class LtdSizeFile(file):
''' A file subclass which limits size of file written to approximately "maxsize" bytes '''
def __init__(self, filename, mode='wt', maxsize=None):
self.root, self.ext = os.path.splitext(filename)
self.num = 1
self.size = 0
if maxsize is not None and maxsize < 1:
raise ValueError('"maxsize: argument should be a positive number')
self.maxsize = maxsize
file.__init__(self, self._getfilename(), mode)
if verbose: print 'file "%s" opened' % self._getfilename()
def close(self):
file.close(self)
self.size = 0
if verbose: print 'file "%s" closed' % self._getfilename()
def write(self, text):
lentext =len(text)
if self.maxsize is None or self.size+lentext <= self.maxsize:
file.write(self, text)
self.size += lentext
else:
self.close()
self.num += 1
file.__init__(self, self._getfilename(), self.mode)
if verbose: print 'file "%s" opened' % self._getfilename()
self.num += 1
file.write(self, text)
self.size += lentext
def writelines(self, lines):
for line in lines:
self.write(line)
def _getfilename(self):
return '{0}{1}{2}'.format(self.root, self.num if self.num > 1 else '', self.ext)
if __name__=='__main__':
import random
import string
def randomword():
letters = []
for i in range(random.randrange(2,7)):
letters.append(random.choice(string.lowercase))
return ''.join(letters)
def randomsentence():
words = []
for i in range(random.randrange(2,10)):
words.append(randomword())
words[0] = words[0].capitalize()
words[-1] = ''.join([words[-1], '.\n'])
return ' '.join(words)
lsfile = LtdSizeFile('LtdSizeTest.txt', 'wt', 100)
for i in range(100):
sentence = randomsentence()
if verbose: print ' writing: {!r}'.format(sentence)
lsfile.write(sentence)
lsfile.close()
I noticed an ambiguity in your question. Do you want the file to be (a) over (b) under (c) exactly 1GiB large, before switching?
It's easy to tell if you've gone over. tell()
is sufficient for that kind of thing; just check if tell() > 1024*1024*1024:
and you'll know.
Checking if you're under 1GiB, but will go over 1GiB on your next write, is a similar technique. if len(data_to_write) + tell > 1024*1024*1024:
will suffice.
The trickiest thing to do is to get the file to exactly 1GiB. You will need to tell()
the length of the file, and then partition your data appropriately in order to hit the mark precisely.
Regardless of exactly which semantics you want, tell()
is always going to be at least as slow as doing the counting yourself, and possibly slower. This doesn't mean that it's the wrong thing to do; if you're writing the file from a thread, then you almost certainly will want to tell()
rather than hope that you've correctly preempted other threads writing to the same file. (And do your locks, etc., but that's another question.)
By the way, I noticed a definite direction in your last couple questions. Are you aware of #twisted and #python IRC channels on Freenode (irc.freenode.net)? You will get timelier, more useful answers.
~ C.
I recommend counting. There's no internal language counter that I'm aware of. Somebody else mentioned using tell()
, but an internal counter will take roughly the same amount of work and eliminate the constant OS calls.
#pseudocode
if (written + sizeOfNew > 1G) {
rotateFile()
}
精彩评论