Limit Python Output File Size

2023-03-13 02:38 问答作者：

I have a Python program running on Debian that outputs data using a File object. I would like to set a limit on how large my file can be but I don't want to stop writing the file--I just want to remove the oldest line (at the top of the file). My data is written randomly as packets arrive from c开发者_开发技巧lients (think web logging).

I know it works but would it be in my best interest to pull this off by using a combination of File.tell() and then executing the following system command if my file exceeds the limit?

sed -i '1 d' filename

Once it reaches the size limit, it would execute sed everytime. Is there a better way?

There is a reason that no logging system uses this strategy. You can't remove the first line from a file without rewriting the whole file, so it's extremely slow on a large file. Also, you can't write new data to the file while you're rewriting it.

The normal strategy is to start writing to a new file when the current one becomes too big. You can then delete files that are older than a threshold. This is the "log rotation" that others have mentioned.

If you really want to create a queue where you remove one line of data as you add a new one, I'd suggest using a database instead. MongoDB and other database managers supports arrays, but you could do something similar with an SQL database if required.

Its seems that you are unaware of logrotate. You are looking for similar implementation. Check this out:

Creating logfile with logrotate
Logrotate Command Tutorial

The reason Python's logging module does not use this strategy is because of the performance penalty it entails. If log files rotated according to size or age simply are not acceptable, then as I see it you have two basic choices: overwrite the log file in place, and write a temp file then replace.

If overwriting the log file in place, you would first choose the integer address in the file (position of first \n byte plus one maybe) that will become the 'new zero' (call it X). Then choose a block size, maybe 32K. Then start counting. Seek to X + block size * block number, read one block. Seek to block size * block number, write the block back. When you reach EOF when reading, truncate the file to length block size * block number.

If using a temp file, find the 'new zero', copy the remainder of the file to a temp file, then rename it to the original name. Easier than the above I guess, easier to explain anyway, but uses more space.

Following all that, write the new data and close the file. This whole procedure has to happen for every log message. Good luck!

You should checkout the Python logging module and more specifically the class RotatingFileHandler. This allows you to write to a file which will have a fixed size. It doesn't however allow to operate on the number of lines.

Unless you need near real time access to the file from another process, I would probably write each log line to a collections.deque of a fixed size. You could implement a method that would sync the items (rows) from the collections.deque to lines in a log file on demand.

继续阅读：python sed

Limit Python Output File Size

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？