开发者

Is it safe to read() from a file as soon as write() returns?

I have a very specific application where I need an auto-increment variable with persistent storage.

To be precise, I store the decimal representation of an int variable on a file. To generate the next number, I read() from the file, convert the contents back to int, add 1 and write() back to the file. I do NOT need concurrent access to this data. Only one thread from one process calls the functions to retrieve the auto-increment number. The program runs on an embedded environment, where no-one will have access to the console, so security should not be a concern. If it matters, it runs on Linux 2.6.24 on MIPS.

The problem is, I am not getting 100% reproducible results. Sometimes I get repeated numbers, which is unacceptable for my application.

My implementation is as follows.

On starting the application, I have:

int fd = open("myfile", O_RDWR|O_CREAT|O_SYNC, S_IRWXU|S_IRWXG|S_IRWXO);

And the auto-increment functions:

int get_current(int fd)
{
    char value[SIZE];
    lseek(fd, 0, SEEK_SET);
    read(fd, value, SIZE);
    return atoi(value);
}

int get_next(int fd)
{
    char value[SIZE];
    int cur = get_current(fd);
    memset(value, 0, SIZE);
    sprintf(value, "%d", cur + 1);
    lseek(fd, 0, SEEK_SET);
    write(fd, value, SIZE);
    //fsync(fd);  /* Could inserting this be the solution? */
    return (cur + 1);
}

I have intentionally left out error checking above for the sake of code readability. I have code in place to check return values of all syscalls.

The code was originally written by another person, and now that I have detected this problem, the first step to solve it is to find out what could have caused it. I am concerned that it could be r开发者_运维百科elated to the way file accesses are cached. I know when I write() I have no gurantee the data ever actually reached the physical medium, but is it safe to call read() without having called fsync() and still get predictable results? If it is, then I'm out of ideas ;)

Thanks for reading through.


Yes, it is safe to read immediately after writing. In a Unix-like system, the data is safely in the kernel buffer pool when a write() returns and will be returned to other processes that need to read the data. Similar comments apply when using O_SYNC, O_DSYNC, O_FSYNC (which ensure that data is written to disk) and to Windows systems. Clearly, an asynchronous write will not be complete when the aio_write() call returns, but it will be complete when the completion is signalled.

However, your problem arises because you are not ensuring that you have a single process or thread accessing the file at a time. You must ensure that you get serial access so that you don't get two processes (or threads) reading from the file at the same time. This is the 'lost update' problem in DBMS terms.

You need to ensure that only one process has access at a time. If your processes cooperate, you can use advisory locking (via fcntl() on POSIX systems). If your processes don't cooperate, or you're not sure, you may need to go for mandatory locking, or use some other technique altogether.


Yes, if you write() to a file and then read() from it you should see the data you just wrote. The exception is if another process or thread has overwritten the file in the meantime, or if the write() actually failed.


The content of a file is a really bad way to implement an atomic counter. How big will your count get? If it's not huge, one simple method would be to write a single byte (doesn't matter what) to increment the counter, and use fstat (st_size) to read the counter. ftrunc can reset the counter to zero.

A cleaner way to implement what you want would be to memory-map the file (with mmap) and store not just the count but also a pthread_mutex_t that's initialized to be process-shared, and lock it when updating the count.

Another way you could use mmap is if you have C1x atomic types (_Atomic int) but you'll have to wait 5-10 years. :-) Or you could use gcc intrinsics or asm for atomic operations. This solution has by far the best performance (mildly better than the pthread_mutex_t approach, and hundreds of times faster than the write approach).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜