开发者

types of buffers

Recen开发者_如何学Pythontly an interviewer asked me about the types of buffers. What types of buffers are there ? Actually this question came up when I said I will be writing all the system calls to a log file to monitor the system. He said it will be slow to write each and every call to a file. How to prevent it. I said I will use a buffer and he asked me what type of buffer ? Can some one explain me types of buffers please.


In C under UNIX (and probably other operating systems as well), there are usually two buffers, at least in your given scenario.

The first exists in the C runtime libraries where information to be written is buffered before being delivered to the OS.

The second is in the OS itself, where information is buffered until it can be physically written to the underlying media.

As an example, we wrote a logging library many moons ago that forced information to be written to the disk so that it would be there if either the program crashed or the OS crashed.

This was achieved with the sequence:

fflush (fh); fsync (fileno (fh));

The first of these actually ensured that the information was handed from the C runtime buffers to the operating system, the second that it was written to disk. Keep in mind that this is an expensive operation and should only be done if you absolutely need the information written immediately (we only did it at the SUPER_ENORMOUS_IMPORTANT logging level).

To be honest, I'm not entirely certain why your interviewer thought it would be slow unless you're writing a lot of information. The two levels of buffering already there should perform quite adequately. If it was a problem, then you could just introduce another layer yourself which wrote the messages to an in-memory buffer and then delivered that to a single fprint-type call when it was about to overflow.

But, unless you do it without any function calls, I can't see it being much faster than what the fprint-type buffering already gives you.


Following clarification in comments that this question is actually about buffering inside a kernel:

Basically, you want this to be as fast, efficient and workable (not prone to failure or resource shortages) as possible.

Probably the best bet would be a buffer, either statically allocated or dynamically allocated once at boot time (you want to avoid the possibility that dynamic re-allocation will fail).

Others have suggested a ring (or circular) buffer but I wouldn't go that way (technically) for the following reason: the use of a classical circular buffer means that to write out the data when it has wrapped around will take two independent writes. For example, if your buffer has:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|s|t|r|i|n|g| |t|o| |w|r|i|t|e|.| | | | | | |T|h|i|s| |i|s| |t|h|e| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                 ^           ^
                                 |           |
                   Buffer next --+           +-- Buffer start

then you'll have to write "This is the " followed by "string to write.".

Instead, maintain the next pointer and, if the bytes in the buffer plus the bytes to be added are less than the buffer size, just add them to the buffer with no physical write to the underlying media.

Only if you are going to overflow the buffer do you start doing tricky stuff.

You can take one of two approaches:

  • Either flush the buffer as it stands, set the next pointer back to the start for processing the new message; or
  • Add part of the message to fill up the buffer, then flush it and set the next pointer back to the start for processing the rest of the message.

I would probably opt for the second given that you're going to have to take into account messages that are too big for the buffer anyway.

What I'm talking about is something like this:

initBuffer:
    create buffer of size 10240 bytes.
    set bufferEnd to end of buffer + 1
    set bufferPointer to start of buffer
    return

addToBuffer (size, message):
    while size != 0:
        xfersz = minimum (size, bufferEnd - bufferPointer)
        copy xfersz bytes from message to bufferPointer
        message = message + xfersz
        bufferPointer = bufferPointer + xfersz
        size = size - xfersz
        if bufferPointer == bufferEnd:
            write buffer to underlying media
            set bufferPointer to start of buffer
        endif
    endwhile

That basically handles messages of any size efficiently by reducing the number of physical writes. There will be optimisations of course - it's possible that the message may have been copied into kernel space so it makes little sense to copy it to the buffer if you're going to write it anyway. You may as well write the information from the kernel copy directly to the underlying media and only transfer the last bit to the buffer (since you have to save it).

In addition, you'd probably want to flush an incomplete buffer to the underlying media if nothing had been written for a time. That would reduce the likelihood of missing information on the off chance that the kernel itself crashes.

Aside: Technically, I guess this is sort of a circular buffer but it has special case handling to minimise the number of writes, and no need for a tail pointer because of that optimisation.


There are also ring buffers which have bounded space requirements and are probably best known in the Unix dmesg facility.


What comes to mind for me is time-based buffers and size-based. So you could either just write whatever is in the buffer to file once every x seconds/minutes/hours or whatever. Alternatively, you could wait until there are x log entries or x bytes worth of log data and write them all at once. This is one of the ways that log4net and log4J do it.


Overall, there are "First-In-First-Out" (FIFO) buffers, also known as queues; and there are "Latest*-In-First-Out" (LIFO) buffers, also known as stacks.

To implement FIFO, there are circular buffers, which are usually employed where a fixed-size byte array has been allocated. For example, a keyboard or serial I/O device driver might use this method. This is the usual type of buffer used when it is not possible to dynamically allocate memory (e.g., in a driver which is required for the operation of the Virtual Memory (VM) subsystem of the OS).

Where dynamic memory is available, FIFO can be implemented in many ways, particularly with linked-list derived data structures.

Also, binomial heaps implementing priority queues may be used for the FIFO buffer implementation.

A particular case of neither FIFO nor LIFO buffer is the TCP segment reassembly buffers. These may hold segments received out-of-order ("from the future") which are held pending the receipt of intermediate segments not-yet-arrived.

* My acronym is better, but most would call LIFO "Last In, First Out", not Latest.


Correct me if I'm wrong, but wouldn't using a mmap'd file for the log avoid both the overhead of small write syscalls and the possibility of data loss if the application (but not the OS) crashed? It seems like an ideal balance between performance and reliability to me.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜