How to make file sparse?

2023-03-05 20:45 问答作者：

If I have a big file containing many zeros, how can i efficiently make it a sparse file?

Is the only possibility to read the whole file (including all zeroes, which may patrially be stored sparse) and to rewrite it to a new file using seek to skip the zero areas?

Or is there a possibility to make this in an existing file (开发者_运维问答e.g. File.setSparse(long start, long end))?

I'm looking for a solution in Java or some Linux commands, Filesystem will be ext3 or similar.

A lot's changed in 8 years.

Fallocate

fallocate -d filename can be used to punch holes in existing files. From the fallocate(1) man page:

       -d, --dig-holes
              Detect and dig holes.  This makes the file sparse in-place,
              without using extra disk space.  The minimum size of the hole
              depends on filesystem I/O block size (usually 4096 bytes).
              Also, when using this option, --keep-size is implied.  If no
              range is specified by --offset and --length, then the entire
              file is analyzed for holes.

              You can think of this option as doing a "cp --sparse" and then
              renaming the destination file to the original, without the
              need for extra disk space.

              See --punch-hole for a list of supported filesystems.

(That list:)

              Supported for XFS (since Linux 2.6.38), ext4 (since Linux
              3.0), Btrfs (since Linux 3.7) and tmpfs (since Linux 3.5).

tmpfs being on that list is the one I find most interesting. The filesystem itself is efficient enough to only consume as much RAM as it needs to store its contents, but making the contents sparse can potentially increase that efficiency even further.

GNU `cp`

Additionally, somewhere along the way GNU cp gained an understanding of sparse files. Quoting the cp(1) man page regarding its default mode, --sparse=auto:

sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well.

But there's also --sparse=always, which activates the file-copy equivalent of what fallocate -d does in-place:

Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes.

I've finally been able to retire my tar cpSf - SOURCE | (cd DESTDIR && tar xpSf -) one-liner, which for 20 years was my graybeard way of copying sparse files with their sparseness preserved.

Some filesystems on Linux / UNIX have the ability to "punch holes" into an existing file. See:

LKML posting about the feature
UNIX file trunctation FAQ (search for F_FREESP)

It's not very portable and not done the same way across the board; as of right now, I believe Java's IO libraries do not provide an interface for this.

If hole punching is available either via fcntl(F_FREESP) or via any other mechanism, it should be significantly faster than a copy/seek loop.

I think you would be better off pre-allocating the whole file and maintaining a table/BitSet of the pages/sections which are occupied.

Making a file sparse would result in those sections being fragmented if they were ever re-used. Perhaps saving a few TB of disk space is not worth the performance hit of a highly fragmented file.

You can use $ truncate -s filename filesize on linux teminal to create sparse file having

only metadata.

NOTE --Filesize is in bytes.

According to this article, it seems there is currently no easy solution, except for using FIEMAP ioctl. However, I don't know how you can make "non sparse" zero blocks into "sparse" ones.

继续阅读：file sparse-file

How to make file sparse?

Fallocate

GNU `cp`

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Fallocate

GNU cp

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

GNU `cp`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？