Sparse Files

Sparse files are files whose metadata reports one size, but the file itself takes less space on the filesystem.
Spare files are a common way to efficiently use disk space. They can be created using the ‘truncate’ command.
Or you can create them by opening a file programmatically, seeking to an offset and then closing the file, without writing anything.

ls -l reports the length of the file, so the same file of 2 bytes will be reported as 2 bytes.
ls -s reports the size based on blocks, so for a 2 byte file, ls -s will report a size of 4K, since the block size is 4K.

du reports size based on blocks being used. For instance, if a file is 2 bytes, and the block size is 4096, du will report the file being 4K.
du -b will report the same size of a file as ls -l, since -b means apparent size.

Both ls -l and du -b do not take into account spare files. If a file if sparse, du -b and ls -l report it as though it is not sparse.

When using the ‘cp’ command, use the ‘cp –sparse=always’ option to keep sparse files as sparse.

‘scp’ is not sparse aware and if you use scp to copy a file that is spare it will take up “more” room on the destination host. Instead if you use rsync with -S option, spare files will be maintained as sparse.

tar is not sparse file smart by default. If you tar a sparse file, the tar file itself and when you untar the tar file, both will fill the sparse areas of the sparse file with zeros, resulting in more disk blocks being used. You should use the ‘-S’ option with tar to make it sparse file smart.

# create a sparse file of size 1GB
$ truncate -s +1G test

# The first number shows 0, which is block based size
$ ls -lsh test
total 1.G
0 -rw-rw-r-- 1 orion orion 1.0G Jan  7 14:07 test

# create tar file
$ tar -cvf test.tar test

# test.tar now really takes up 1GB
$ ls -ls test.tar
1.1G -rw-rw-r-- 1 orion orion 1.1G Jan  7 14:08 test.tar

# untarring now shows that the file is now using 1GB, before it was using 0GB
$ rm test
$ tar xvf test.tar
$ ls -lsh test
1.0G -rw-rw-r-- 1 orion orion 1.0G Jan  7 14:07 test

With the -S option, tar is smarter and the file continues to be still sparse.

# create a sparse file of size 1GB
$ truncate -s +1G test

# The first number shows 0, which is block based size
$ ls -lsh test
total 1.G
0 -rw-rw-r-- 1 orion orion 1.0G Jan  7 14:07 test

# create tar file with -S
$ tar -S -cvf test.tar test

# test.tar allocated size based on blocks is now 12
$ ls -ls test.tar
12 -rw-rw-r-- 1 orion orion      10240 Jan  7 14:19 test.tar

# untarring now shows that the file is still sparse
$ rm test
$ tar xvf test.tar
$ ls -lsh test
0 -rw-rw-r-- 1 orion orion 1073741824 Jan  7 14:19 test

When we do a ‘stat’ on a spare file, we see it it taking up no space in terms of blocks.

$ stat test
  File: `test'
  Size: 1073741824	Blocks: 0          IO Block: 4096   regular file
Device: fd07h/64775d	Inode: 1046835     Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  500/   orion)   Gid: (  500/   orion)
Access: 2015-01-07 14:19:53.957911258 -0800
Modify: 2015-01-07 14:17:58.000000000 -0800
Change: 2015-01-07 14:19:53.957623281 -0800

We can also measure the extents used.

$ filefrag test
test: 0 extents found

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s