开发者

File size lookup in C

I was wondering if there was any significant performance inc开发者_开发技巧rease in using sys/stat.h versus fseek() and and ftell()?


Choosing between fstat() and the fseek()/ftell() combination, there isn't going to be much difference. The single function call should be slightly quicker than the double function call, but the difference won't be great.

Choosing between stat() and the combination isn't a very fair comparison. For the combination calls, the hard work was done when the file was opened, so the inode information is readily available. The stat() call has to parse the file path and then report what it finds. It should almost always be slower - unless you recently opened the file anyway so the kernel has most of the information cached. Even so, the pathname lookup required by stat() is likely to make it slower than the combination.


If you're not sure, try it!

I just coded this test. I generated 10,000 files of 2KB each, and iterated over all of them, asking for their file size.

Results on my machine by measuring with the "time" command and doing an average of 10 runs:

  • fseek/fclose version: 0.22 secs
  • stat version: 0.06 secs

So, the winner (at least on my machine): stat!

Here's the test code:

#include <stdio.h>
#include <sys/stat.h>

#if 0 
size_t getFileSize(const char * filename)
{
    struct stat st;
    stat(filename, &st);
    return st.st_size;
}
#else
size_t getFileSize(const char * filename)
{
    FILE * fd=fopen(filename, "rb");
    if(!fd)
        printf("ERROR on file %s\n", filename);

    fseek(fd, 0, SEEK_END);
    size_t size = ftell(fd);
    fclose(fd);
    return size;
}
#endif

int main()
{   
    char buf[256];
    int i, n;
    for(i=0; i<10000; ++i)
    {   
        sprintf(buf, "file_%d", i);
        if(getFileSize(buf)!= 2048)
            printf("WRONG!\n");
    }
    return 0;
}


Logically, one would assume that fseek() when prompted to seek to the end of the file uses stat to know how far to seek, or rather, where the end of the file is.

This would make fseek slower than using the facilities directly, and it also requires you to fopen the file in the first place.

Still, any performance difference is likely to be negligible, and if you need to open the file for some reason anyway, fseek/ftell likely improves the readability of your code significantly.


For stat.h you mainly want to use it to tell the stats of the file. Like if you want to tell if it's a file or a directory, etc.

However, if you want to do manipulations with the file, then you'll probably want to use ftell() and fseek(). That is you're actually doing manipulations on the file stream itself.

So in terms of performance, it's really what you need.

Hope it helps :) Cheers!


Depending on the circumstances, stat() can be hundred of times faster then seek()/tell(). I am currently toying around with sshfs/FUSE and getting the file size of a few thousand files with seek()/tell() takes well over a minute, doing it with stat() takes a second. So the difference is pretty huge when working over sshfs/FUSE.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜