Sparse files are files whose metadata reports one size, but the file itself takes less space on the filesystem.
Spare files are a common way to efficiently use disk space. They can be created using the ‘truncate’ command.
Or you can create them by opening a file programmatically, seeking to an offset and then closing the file, without writing anything.
ls -l reports the length of the file, so the same file of 2 bytes will be reported as 2 bytes.
ls -s reports the size based on blocks, so for a 2 byte file, ls -s will report a size of 4K, since the block size is 4K.
du reports size based on blocks being used. For instance, if a file is 2 bytes, and the block size is 4096, du will report the file being 4K.
du -b will report the same size of a file as ls -l, since -b means apparent size.
Both ls -l and du -b do not take into account spare files. If a file if sparse, du -b and ls -l report it as though it is not sparse.
When using the ‘cp’ command, use the ‘cp –sparse=always’ option to keep sparse files as sparse.
‘scp’ is not sparse aware and if you use scp to copy a file that is spare it will take up “more” room on the destination host. Instead if you use rsync with -S option, spare files will be maintained as sparse.
tar is not sparse file smart by default. If you tar a sparse file, the tar file itself and when you untar the tar file, both will fill the sparse areas of the sparse file with zeros, resulting in more disk blocks being used. You should use the ‘-S’ option with tar to make it sparse file smart.
# create a sparse file of size 1GB $ truncate -s +1G test # The first number shows 0, which is block based size $ ls -lsh test total 1.G 0 -rw-rw-r-- 1 orion orion 1.0G Jan 7 14:07 test # create tar file $ tar -cvf test.tar test # test.tar now really takes up 1GB $ ls -ls test.tar 1.1G -rw-rw-r-- 1 orion orion 1.1G Jan 7 14:08 test.tar # untarring now shows that the file is now using 1GB, before it was using 0GB $ rm test $ tar xvf test.tar $ ls -lsh test 1.0G -rw-rw-r-- 1 orion orion 1.0G Jan 7 14:07 test
With the -S option, tar is smarter and the file continues to be still sparse.# create a sparse file of size 1GB $ truncate -s +1G test # The first number shows 0, which is block based size $ ls -lsh test total 1.G 0 -rw-rw-r-- 1 orion orion 1.0G Jan 7 14:07 test # create tar file with -S $ tar -S -cvf test.tar test # test.tar allocated size based on blocks is now 12 $ ls -ls test.tar 12 -rw-rw-r-- 1 orion orion 10240 Jan 7 14:19 test.tar # untarring now shows that the file is still sparse $ rm test $ tar xvf test.tar $ ls -lsh test 0 -rw-rw-r-- 1 orion orion 1073741824 Jan 7 14:19 test
When we do a ‘stat’ on a spare file, we see it it taking up no space in terms of blocks.$ stat test File: `test' Size: 1073741824 Blocks: 0 IO Block: 4096 regular file Device: fd07h/64775d Inode: 1046835 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 500/ orion) Gid: ( 500/ orion) Access: 2015-01-07 14:19:53.957911258 -0800 Modify: 2015-01-07 14:17:58.000000000 -0800 Change: 2015-01-07 14:19:53.957623281 -0800
We can also measure the extents used.$ filefrag test test: 0 extents found