How disk space is allocated for an edited file

2023-02-23 08:51 问答作者：

Assume I save a text file in the HDD disk storage(assume the disk storage is new and so defragmented) and the file name is A with a file size of say 10MB

I presume, the file A occupies some space in the disk as shown, where x is an unoccupied space/memory on the disk

AAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Now, I create and save another file B of some size. So B will be saved as

AAAAAAAAAAAAABBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - as the disk is defra开发者_StackOverflowgmented, I assume the storage will be contiguous.

Here, what if I edit the file A and reduce the file size to 2MB. Can you say how the memory will be allocated now.

Some options I could think of are

AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

or a totally new location freeing up the bigger chunk for other files.

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxxx

or is it any other way based on any algorithm or data-structure.

A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.

Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.

So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.

With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx

But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).

So the short answer is, "it depends".

How allocation is done depends entirely on the file system type (e.g. FAT32, NTFS, jfs, reiser, etc. etc.) and the driver software. Your assumption that the file will be stored contiguously is not necessarily true - it may be more performant to store it in a different pattern, depending on hardware. For example, let's say you have a disk with 16 cylinder heads and a blocksize of 512 bytes, then it could be most efficient to store an amount of 8k data on 16 different cylinders.
OTOH, with recent hardware that does not involve rotating mechanical parts, the story changes dramatically - a concept like "fragmentation" becomes suddenly meaningless, because the access time to each block is the same - no matter in which order it is done.

No it's like this:

First you create file A: (here big A stands for data actually used for A and 'a' for reserved data for A, x stands for free).

AAAAAAAAAAAAAaaaaaaaXXXXXXXXXXXXXXXXXXX

Then B is added:

AAAAAAAAAAAAAaaaaaaaBBBBbbbbbbbbbb

Then C is added, but there is no unreserved space left:

AAAAAAAAAAAAAaaaaaaaBBBBbbbbCCCccc

If A is truncated this is what will happen

AAAAAaaaaaaaxxxxxxxxBBBBbbbbCCCccc

If B is now expanded this will happen:

AAAAAaaaaaaaBBBBxxxxxBBBBBBBBCCCccc

You see that the data for B is no longer close to each other, this is called fragmentation. When you run a defragmentation tool the data is placed close together again.

继续阅读：data-structures drives filesystems hard-drive hardware

How disk space is allocated for an edited file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？