开发者

Limit Python Output File Size

I have a Python program running on Debian that outputs data using a File object. I would like to set a limit on how large my file can be but I don't want to stop writing the file--I just want to remove the oldest line (at the top of the file). My data is written randomly as packets arrive from c开发者_开发技巧lients (think web logging).

I know it works but would it be in my best interest to pull this off by using a combination of File.tell() and then executing the following system command if my file exceeds the limit?

sed -i '1 d' filename 

Once it reaches the size limit, it would execute sed everytime. Is there a better way?


There is a reason that no logging system uses this strategy. You can't remove the first line from a file without rewriting the whole file, so it's extremely slow on a large file. Also, you can't write new data to the file while you're rewriting it.

The normal strategy is to start writing to a new file when the current one becomes too big. You can then delete files that are older than a threshold. This is the "log rotation" that others have mentioned.

If you really want to create a queue where you remove one line of data as you add a new one, I'd suggest using a database instead. MongoDB and other database managers supports arrays, but you could do something similar with an SQL database if required.


Its seems that you are unaware of logrotate. You are looking for similar implementation. Check this out:

  • Creating logfile with logrotate
  • Logrotate Command Tutorial


The reason Python's logging module does not use this strategy is because of the performance penalty it entails. If log files rotated according to size or age simply are not acceptable, then as I see it you have two basic choices: overwrite the log file in place, and write a temp file then replace.

If overwriting the log file in place, you would first choose the integer address in the file (position of first \n byte plus one maybe) that will become the 'new zero' (call it X). Then choose a block size, maybe 32K. Then start counting. Seek to X + block size * block number, read one block. Seek to block size * block number, write the block back. When you reach EOF when reading, truncate the file to length block size * block number.

If using a temp file, find the 'new zero', copy the remainder of the file to a temp file, then rename it to the original name. Easier than the above I guess, easier to explain anyway, but uses more space.

Following all that, write the new data and close the file. This whole procedure has to happen for every log message. Good luck!


You should checkout the Python logging module and more specifically the class RotatingFileHandler. This allows you to write to a file which will have a fixed size. It doesn't however allow to operate on the number of lines.


Unless you need near real time access to the file from another process, I would probably write each log line to a collections.deque of a fixed size. You could implement a method that would sync the items (rows) from the collections.deque to lines in a log file on demand.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜