The Overlay Filesystem

The overlay filesystem (formally known as overlayfs) was merged into the mainline Linux kernel at version 3.18 in December 2014. Whilst other, similar union mount filesystems have been around for many years (notably, aufs), overlay is the first to become integrated into the Linux kernel.

An overlay sits on top of an existing filesystem, and combines an upper and a lower directory tree (which can be from different filesystems), in order to present a unified representation of both directory trees. Where objects with the same name exist in both directory trees, then their treatment depends on the object type:

  • File: the object in the upper directory tree appears in the overlay, whilst the object in the lower directory tree is hidden
  • Directory: the contents of each directory object are merged to create a combined directory object in the overlay

The lower directory can be read-only, and could be an overlay itself, whilst the upper directory is normally writeable. In order to create an overlay of two directories, dir1 and dir2, we can use the following mount command:

mount -t overlay -o lowerdir=./dir1,upperdir=./dir2,workdir=./work overlay ./dir3

A union of the two directories is created as an overlay in the dir3 directory. The workdir option is required, and used to prepare files before they are switched to the overlay destination in an atomic action (the workdir needs to be on the same filesystem as the upperdir). The following illustrates a simple example of the overlay mount above:


When a file or directory that originates in the upper directory is removed from the overlay, it's also removed from the upper directory. If a file or directory that originates in the lower directory is removed from the overlay, it remains in the lower directory, but a 'whiteout' is created in the upper directory. A whiteout takes the form of a character device with device number 0/0, and a name identical to the removed object. The result of the whiteout creation means that the object in the lower directory is ignored, whilst the whiteout itself is not visible in the overlay. The following illustrates the creation of a whiteout in the upperdir on removal of the file mango:

$ ls -l ./dir3/fruit
total 72
-rw-rw-r-- 1 bill bill  1320 May 20 12:39 apple
-rw-rw-r-- 1 bill bill    92 May 20 11:53 grape
-rw-rw-r-- 1 bill bill 63456 May 20 11:53 mango
$ rm ./dir3/fruit/mango
$ ls -l ./dir3/fruit
total 8
-rw-rw-r-- 1 bill bill  1320 May 20 12:39 apple
-rw-rw-r-- 1 bill bill    92 May 20 11:53 grape
$ ls -l ./dir2/fruit
total 4
-rw-rw-r-- 1 bill bill 1320 May 20 12:39 apple
c--------- 1 bill bill 0, 0 May 20 17:38 mango

Linux kernel 4.0 further extends the overlay capabilities, to enable multiple lower directories to be specified, separated by a :, with the rightmost lower directory on the bottom, and the leftmost lower directory on the top of the union. For example:

mount -t overlay -o lowerdir=./dir3:./dir2:./dir1 overlay ./dir4

In this extended version, the upperdir is optional, and if it is omitted, then the workdir option is also optional, and will be ignored in any case. In this scenario, the overlay will be read-only.

At the time of writing, Linux kernel version 4.0 is very new, and will not have found its way into many Linux distributions.

Use Cases

Union filesystems are often used for Live CD creation, where a read-only image is augmented with a writeable layer in tmpfs, thereby enabling a dynamic, but ephemeral session.

Effectively, this is 'copy-on-write', where read-only data is used until such time as the data requires changing, whereupon it is copied and altered in the read-write layer. This copy-on-write mechanism is used in the creation of filesystems for Linux containers, used by container runtime environments like Docker or rkt. It's not the only option for assembling container filesystems, but it is one of the more performant, because it allows pages in the kernel's page cache to be shared between containers - an option which is not available with block device copy-on-write mechanisms, such as the device mapper framework with the thinp target, or the btrfs filesystem. Docker's overlay graphdriver is currently the last in the queue for automatic selection (vfs is used for testing), behind aufs, btrfs and the devicemapper graphdrivers, but as the remaining issues are closed out, I expect it to become the default.