Using fseek and ftell to determine the size of a file has a vulnerability?

2023-03-04 19:40 问答作者：

I've read posts that show how to use fseek and ftell to determine the size of a file.

FILE *fp;
long file_size;
char *buffer;

fp = fopen("foo.bin", "r");
if (NULL == fp) {
 /* Handle Error */
}

if (fseek(fp, 0 , SEEK_END) != 0) {
  /* Handle Error */
}

file_si开发者_如何学Goze = ftell(fp);
buffer = (char*)malloc(file_size);
if (NULL == buffer){
  /* handle error */
}

I was about to use this technique but then I ran into this link that describes a potential vulnerability.

The link recommends using fstat instead. Can anyone comment on this?

The link is one of the many nonsensical pieces of C coding advice from CERT. Their justification is based on liberties the C standard allows an implementation to take, but which are not allowed by POSIX and thus irrelevant in all cases where you have fstat as an alternative.

POSIX requires:

that the "b" modifier for fopen have no effect, i.e. that text and binary mode behave identically. This means their concern about invoking UB on text files is nonsense.
that files have a byte-resolution size set by write operations and truncate operations. This means their concern about random numbers of null bytes at the end of the file is nonsense.

Sadly with all the nonsense like this they publish, it's hard to know which CERT publications to take seriously. Which is a shame, because lots of them are serious.

If your goal is to find the size of a file, definitely you should use fstat() or its friends. It's a much more direct and expressive method--you are literally asking the system to tell you the file's statistics, rather than the more roundabout fseek/ftell method.

A bonus tip: if you only want to know if the file is available, use access() rather than opening the file or even stat'ing it. This is an even simpler operation which many programmers aren't aware of.

The reason to not use fstat is that fstat is POSIX, but fopen, ftell and fseek are part of the C Standard.

There may be a system that implements the C Standard but not POSIX. On such a system fstat would not work at all.

I'd tend to agree with their basic conclusion that you generally shouldn't use the fseek/ftell code directly in the mainstream of your code -- but you probably shouldn't use fstat either. If you want the size of a file, most of your code should use something with a clear, direct name like filesize.

Now, it probably is better to implement that using fstat where available, and (for example) FindFirstFile on Windows (the most obvious platform where fstat usually won't be available).

The other side of the story is that many (most?) of the limitations on fseek with respect to binary files actually originated with CP/M, which didn't explicitly store the size of a file anywhere. The end of a text file was signaled by a control-Z. For a binary file, however, all you really knew was what sectors were used to store the file. In the last sector, you had some amount of unused data that was often (but not always) zero-filled. Unfortunately, there might be zeros that were significant, and/or non-zero values that weren't significant.

If the entire C standard had been written just before being approved (e.g., if it had been started in 1988 and finished in 1989) they'd probably have ignored CP/M completely. For better or worse, however, they started work on the C standard in something like 1982 or so, when CP/M was still in wide enough use that it couldn't be ignored. By the time CP/M was gone, many of the decisions had already been made and I doubt anybody wanted to revisit them.

For most people today, however, there's just no point -- most code won't port to CP/M without massive work; this is one of the relatively minor problems to deal with. Making a modern program run in only 48K (or so) of memory for both the code and data is a much more serious problem (having a maximum of a megabyte or so for mass storage would be another serious problem).

CERT does have one good point though: you probably should not (as is often done) find the size of a file, allocate that much space, and then assume the contents of the file will fit there. Even though the fseek/ftell will give you the correct size with modern systems, that data could be stale by the time you actually read the data, so you could overrun your buffer anyway.

According to C standard, §7.21.3:

Setting the ﬁle position indicator to end-of-ﬁle, as with fseek(file, 0, SEEK_END), has undeﬁned behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

A letter-of-the-law kind of guy might think this UB can be avoided by calculating file size with:

fseek(file, -1, SEEK_END);
size = ftell(file) + 1;

But the C standard also says this:

A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

As a result, there is nothing we can do to fix this with regard to fseek / SEEK_END. Still, I would prefer fseek / ftell instead of OS-specific API calls.

继续阅读：c file fseek

Using fseek and ftell to determine the size of a file has a vulnerability?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？