what is the relation betwenn file pointer width and maximum file size
Just curious about the maximum file size limit provided by some popular file systems on Linux, I have seen some are up to TB scale.
My question is what 开发者_Python百科if the file pointer is 32 bits wide, like most Linux we meet today, doesn't that mean that the maximum distance we can address is 2^32-1 bytes? Then how can we store a file larger than 4GB?
Furthermore, even if we can store such a file, how can we locate a position beyond the 2^32 range?
To use files larger than 4 GB, you need "large file support" (LFS) on Linux. One of the changes LFS introduced was that file offsets are 64bit numbers. This is independent of whether Linux itself is running in 32 or 64bit mode (e.g. x86 vs. x86-64). See e.g. http://www.suse.de/~aj/linux_lfs.html
LFS was introduced mostly in glibc 2.2 and kernel 2.4.0 (roughly in 2000-2001), so any recent Linux distribution will have it.
To use it on Linux, you can either use special functions (e.g. lseek64
instead of lseek
), or set #define _FILE_OFFSET_BITS 64
, then the regular functions will use 64bit offsets.
In Linux, at least, it's trivial to write programs to work with larger files explicitly (i.e., not just using a streaming approach as suggested by kohlehydrat).
See this page, for instance. The trick usually comes down to having a magic #define
before including some of the system headers, which "turn on" the "large file support". This typically doubles the size of the file offset type to 64 bits, which is quite a lot.
There is no relation whatsoever. The FILE *
pointer from C stdio is an opaque handle that has no relation to the size of the on-disk file, and the memory it points too can be much bigger than the pointer itself. The function fseek()
, to reposition where we read from and write to, already takes a long
, and fgetpos()
and fsetpos()
use an opaque fpos_t
.
What can make working with large files difficult is off_t
used as an offset in various system calls. Fortunately, people realized this would be an issue, and came up with "Large File Support" (LFS), which is an altered ABI with a wider width for the offset type off_t
. (Typically this is done by introducing a new API, and #define
ing the old names to invoke this new API.)
You can use lseek64
to handle big files. Ext4 can handle 16 TiB files.
Just call repeatedly read(int fd, void *buf, size_t count);
(So there's no need for a 'pointer' into the file.)
From the filesystem-design-point-of-view, you're basically having an index tree (Inodes), which points to several pieces of that data (blocks), that form the actual file. Using this model, you can theoretically have infinte sizes of files.
UNIX has actual physical limits to file size determined by the number of bytes a 32 bit file pointer can index, about 2.4 GB.
consider closing the first file just before it reaches 0x7fffffff bytes in length, and opening an additional new file.
The reason for some limits of the ext2-file system are the file format of the data and the operating system's kernel. Mostly these factors will be determined once when the file system is built. They depend on the block size and the ratio of the number of blocks and inodes. In Linux the block size is limited by the architecture page size.
There are also some userspace programs that can't handle files larger than 2 GB.
The maximum file size is limited to min( (b/4)3+(b/4)2+b/4+12, 232*b )
due to the i_block (an array of EXT2_N_BLOCKS)
and i_blocks( 32-bits integer value )
representing the amount of b-bytes "blocks" in the file.
精彩评论