Copy files more efficiently with the cp command, taking advantage of the BTRFS CoW mechanism, without duplicating space unnecessarily!
If you’re utilizing the BTRFS file system, it’s important to note that it offers an alternative method for copying files. Unlike the conventional approach where data is duplicated in its entirety, thus occupying double the space, Copy-on-write (CoW) allows for a more space-efficient process.
Copy-on-write
BTRFS uses copy-on-write data management technique for all files by default, which means that when a file is modified or written, the original data (block) is not overwritten like in traditional file systems. Instead, a copy of the modified data is created, enhancing data integrity. It then updates the metadata to point to the new location of the data.
Reflink
Reflink is a type of shallow copy of file data that shares the blocks but otherwise the files are independent and any change to the file will not affect the original. This builds on the underlying Copy-on-Write mechanism.
A reflink will effectively just create a separate metadata pointing to the shared blocks, which is usually much faster than a deep copy of all the blocks.
Requirements
- The storage drive must be formatted with the BTRFS file system or another copy-on-write file system.
- Linux kernel 5.18 or above (check with uname -r)
- Have COW enable in the file system, by default it is enabled unless you use the NOCOW flag when mounting the file system
Copy files with Reflink using cp
Syntax
cp --reflink=always source target
When reflink=always is specified, perform a shallow copy, where the data blocks are copied only when modified. If this is not possible, the copy fails.
Example
I created a 4GB file and three copies of it using reflink. As we know, the files were not actually duplicated. Instead, the metadata points to the original file, allowing the data to be shared among them, saving space.
Cons
Cross-filesystem reflink is not possible, there’s nothing in common between, so the block sharing can’t work.