How OS performs buffering for a file
I know that when you call fwrite
or fprintf
or rather any other function that writes to a file, the contents aren't immediately flushed to the disk, but buffered in the memory.
Firstly, where do the OS manage these buffers and how. Secondly, if you do the write to a file and later read in the content you wrote and assuming that the OS didn't flush开发者_C百科ed the contents between the time you wrote and read, how it knows that it has to return the read from the buffer? How does it handle this situation.
The reason I want to know this is that I'm interested in implementing my own buffering scheme in user-space, rather than kernel space as done by OS. That is, write to a file would be buffered in user-space and the actual write will only occur at a certain point. Consquently I also need to handle situations where read is called for the content that is still in the buffer. Is it possible to do all this in user-space.
Firstly, where do the OS manage these buffers and how
The functions fwrite
and fprintf
use stdio buffers which already are completely in userspace. The buffers are (likely) static arrays or perhaps malloced memory.
how it knows that it has to return the read from the buffer
It doesn't, so the changes aren't seen. Nothing actually happens to a file until the underlying system call (write
) is called (and even then - read on).
Is it possible to do all this in user-space
No, it's not possible. The good news is that the kernel already has buffers so every write
you do isn't atually translated into an actual write to the file. It is postponed and executed later. If in the meantime somebody tries to read from the file, the kernel is smart enough to serve him from the buffer.
Bits from TLPI:
When working with disk files, the read() and write() system calls don’t directly ini- tiate disk access. Instead, they simply copy data between a user-space buffer and a buffer in the kernel buffer cache.
When performing I/O on a disk file, a successful return from write() doesn’t guarantee that the data has been transferred to disk, because the kernel performs buffering of disk I/O in order to reduce disk activity and expedite write() calls. At some later point, the kernel writes (flushes) its buffer to the disk.
If, in the interim, another process attempts to read these bytes of the file, then the kernel automatically supplies the data from the buffer cache, rather than from (the outdated contents of) the file.
So you may want to find out about sync
and fsync
.
Multiple levels of buffering are generally bad. The reason stdio
buffers are useful is that they minimize the number of system calls performed. If a system call would be cheaper nobody would use stdio buffers anymore.
精彩评论