Writing data into file: fflush() takes a lot of time

2023-03-19 00:03 问答作者：

I have a requirement wherein I have to buffer a lot of data (in GBs), for future use. Since there isn't enough RAM available for buffering such huge amount of data, I decided to go for storing the data in a file.

Now the pitfall here is that while I am writing the data to the file, other threads might need that "buffered" data and so I have to flush the file stream every time I write something to it. Precisely, the data is video frames that I buffer as pre-recorded data (like a TiVo) and other threads may or may not want to write it at any given point in time, but when they do, they fread from the file and process the frames.

In the general case, the fwrite-fflush combo takes around 150 us but occasionally (and fairly regularly), the combo takes more than 1.5 seconds. I can't afford this as I have to process frames in real-time.

I have many questions here:

Is my approach of buffering data in the file correct? 开发者_JAVA技巧What alternatives do I have?
Any idea why the fwrite-fflush operation suddenly takes more time on some occasions? Note that it reverts back to 150 us after taking 1.5 seconds once.

As for #2: Most modern file systems use a btree approach to manage the amount of directory and data nodes in todays huge HDs. As with all btrees, they need to be balanced sometimes. While that happens, no changes must happen, so that's why the system locks up. Usually, it's not a big deal because of the huge caches of the OS but you're a corner case where it hurts.

What can you do about it? There are two approaches:

Use sockets to communicate and keep the last N frames in RAM (i.e. never write them to disk or use an independent process to write it to disk).
Don't write a new file, overwrite an existing file. Since the location of all data blocks is known in advance, there will be no reorg in the FS while you write. It will also be a little bit faster. So the idea is to create a huge file or use a raw partition and then overwrite it. When you hit the end of the file, seek back to the start and repeat.

Drawbacks:

With approach #1, you can lose frames. Also, you must make absolutely sure that all clients can read and process the data fast enough or the server might block.

With #2, you must find a way to tell the readers where the current "end of file" is.

So maybe a mixed approach is best:

Create a huge file (several GB). If one file isn't enough, create several.
Open a socket
Write the data to the file. If you reach the end of the file, seek to position 0 and continue writing there (like a cyclic buffer).
Flush the data
Send the start and amount of the new data to the readers via the socket

Consider using memory mapped files; that will make everything a bit more simple.

Besides RAM and disk, there are not really any other options, only variations. I think the approach is sound though: you are getting really good file system performance.

The extra occasional time could well be due to the file system looking for more free space (it maintains a short list, but when exhausted, a more expensive search is needed) and allocating it into the file. If this is the cause, preallocate the file at maximum size and write into it using random i/o (fopen (fn, "r+")) so that it does not truncate the file length.

Another technique which might help stabilize file i/o time is to write each frame buffer at a file offset which is aligned to a sector boundary. That way the file system doesn't have to handle an oddly offset write operation by first reading from the sector to preserve what won't be overwritten.

继续阅读：buffering c io optimization video-processing

Writing data into file: fflush() takes a lot of time

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？