In the past months I have been working on a new project:
inspiration from the popular
synchronization tool as well as the probably even more popular
git revision control system. It combines the
idea of the
rsync algorithm with the idea of
content-addressable file systems, and creates a new system for
efficiently storing and delivering file system images, optimized for
high-frequency update cycles over the Internet. Its current focus is
on delivering IoT, container, VM, application, portable service or OS
images, but I hope to extend it later in a generic fashion to become
useful for backups and home directory synchronization as well (but
more about that later).
The basic technological building blocks
casync is built from are
neither new nor particularly innovative (at least not anymore),
however the way
casync combines them is different from existing tools,
and that's what makes it useful for a variety of use-cases that other
tools can't cover that well.
casync after studying how today's popular tools store and
deliver file system images. To briefly name a few: Docker has a
layered tarball approach,
OSTree serves the
individual files directly via HTTP and maintains packed deltas to
speed up updates, while other systems operate on the block layer and
squashfs images (or other archival file systems, such as
IS09660) for download on HTTP shares (in the better cases combined
Neither of these approaches appeared fully convincing to me when used in high-frequency update cycle systems. In such systems, it is important to optimize towards a couple of goals:
- Most importantly, make updates cheap traffic-wise (for this most tools use image deltas of some form)
- Put boundaries on disk space usage on servers (keeping deltas between all version combinations clients might want to run updates between, would suggest keeping an exponentially growing amount of deltas on servers)
- Put boundaries on disk space usage on clients
- Be friendly to Content Delivery Networks (CDNs), i.e. serve neither too many small nor too many overly large files, and only require the most basic form of HTTP. Provide the repository administrator with high-level knobs to tune the average file size delivered.
- Simplicity to use for users, repository administrators and developers
I don't think any of the tools mentioned above are really good on more than a small subset of these points.
Specifically: Docker's layered tarball approach dumps the "delta" question onto the feet of the image creators: the best way to make your image downloads minimal is basing your work on an existing image clients might already have, and inherit its resources, maintaining full history. Here, revision control (a tool for the developer) is intermingled with update management (a concept for optimizing production delivery). As container histories grow individual deltas are likely to stay small, but on the other hand a brand-new deployment usually requires downloading the full history onto the deployment system, even though there's no use for it there, and likely requires substantially more disk space and download sizes.
OSTree's serving of individual files is unfriendly to CDNs (as many small files in file trees cause an explosion of HTTP GET requests). To counter that OSTree supports placing pre-calculated delta images between selected revisions on the delivery servers, which means a certain amount of revision management, that leaks into the clients.
squashfs (or other file system) images is almost
beautifully simple, but of course means every update requires a full
download of the newest image, which is both bad for disk usage and
generated traffic. Enhancing it with
zsync makes this a much better
option, as it can reduce generated traffic substantially at very
little cost of history/meta-data (no explicit deltas between a large
number of versions need to be prepared server side). On the other hand
server requirements in disk space and functionality (HTTP Range
requests) are minus points for the use-case I am interested in.
(Note: all the mentioned systems have great properties, and it's not my intention to badmouth them. They only point I am trying to make is that for the use case I care about — file system image delivery with high high frequency update-cycles — each system comes with certain drawbacks.)
Security & Reproducibility
Besides the issues pointed out above I wasn't happy with the security
and reproducibility properties of these systems. In today's world
where security breaches involving hacking and breaking into connected
systems happen every day, an image delivery system that cannot make
strong guarantees regarding data integrity is out of
date. Specifically, the tarball format is famously nondeterministic:
the very same file tree can result in any number of different
valid serializations depending on the tool used, its version and the
underlying OS and file system. Some
tar implementations attempt to
correct that by guaranteeing that each file tree maps to exactly
one valid serialization, but such a property is always only specific
to the tool used. I strongly believe that any good update system must
guarantee on every single link of the chain that there's only one
valid representation of the data to deliver, that can easily be
What casync Is
So much about the background why I created
casync. Now, let's have a
casync actually is like, and what it does. Here's the brief
Encoding: Let's take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk's contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let's call this directory a "chunk store". At the same time, generate a "chunk index" file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream, and do so in a way that the same data results in the same chunks even if placed at varying offsets. For more information see this blog story.
Decoding: Let's take the chunk index file, and reassemble the large linear data stream by concatenating the uncompressed chunks retrieved from the chunk store, keyed by the listed chunk hash values.
As an extra twist, we introduce a well-defined, reproducible,
random-access serialization format for file trees (think: a more
tar), to permit efficient, stable storage of complete file
trees in the system, simply by serializing them and then passing them
into the encoding step explained above.
Finally, let's put all this on the network: for each image you want to deliver, generate a chunk index file and place it on an HTTP server. Do the same with the chunk store, and share it between the various index files you intend to deliver.
Why bother with all of this? Streams with similar contents will result in mostly the same chunk files in the chunk store. This means it is very efficient to store many related versions of a data stream in the same chunk store, thus minimizing disk usage. Moreover, when transferring linear data streams chunks already known on the receiving side can be made use of, thus minimizing network traffic.
Why is this different from
rsync or OSTree, or similar tools? Well,
one major difference between
casync and those tools is that we
remove file boundaries before chunking things up. This means that
small files are lumped together with their siblings and large files
are chopped into pieces, which permits us to recognize similarities in
files and directories beyond file boundaries, and makes sure our chunk
sizes are pretty evenly distributed, without the file boundaries
The "chunking" algorithm is based on a the buzhash rolling hash function. SHA256 is used as strong hash function to generate digests of the chunks. xz is used to compress the individual chunks.
Here's a diagram, hopefully explaining a bit how the encoding process works, wasn't it for my crappy drawing skills:
The diagram shows the encoding process from top to bottom. It starts with a block device or a file tree, which is then serialized and chunked up into variable sized blocks. The compressed chunks are then placed in the chunk store, while a chunk index file is written listing the chunk hashes in order. (The original SVG of this graphic may be found here.)
casync operates on two different layers, depending on the
use-case of the user:
You may use it on the block layer. In this case the raw block data on disk is taken as-is, read directly from the block device, split into chunks as described above, compressed, stored and delivered.
You may use it on the file system layer. In this case, the file tree serialization format mentioned above comes into play: the file tree is serialized depth-first (much like
tarwould do it) and then split into chunks, compressed, stored and delivered.
The fact that it may be used on both the block and file system layer opens it up for a variety of different use-cases. In the VM and IoT ecosystems shipping images as block-level serializations is more common, while in the container and application world file-system-level serializations are more typically used.
Chunk index files referring to block-layer serializations carry the
.caibx suffix, while chunk index files referring to file system
serializations carry the
.caidx suffix. Note that you may also use
casync as direct
tar replacement, i.e. without the chunking, just
generating the plain linear file tree serialization. Such files
.catar suffix. Internally
.caibx are identical to
.caidx files, the only difference is semantical:
.catar file, while
.caibx files may describe any other
blob. Finally, chunk stores are directories carrying the
Here are a couple of other features
When downloading a new image you may use
--seed=feature: each block device, file, or directory specified is processed using the same chunking logic described above, and is used as preferred source when putting together the downloaded image locally, avoiding network transfer of it. This of course is useful whenever updating an image: simply specify one or more old versions as seed and only download the chunks that truly changed since then. Note that using seeds requires no history relationship between seed and the new image to download. This has major benefits: you can even use it to speed up downloads of relatively foreign and unrelated data. For example, when downloading a container image built using Ubuntu you can use your Fedora host OS tree in
/usras seed, and
casyncwill automatically use whatever it can from that tree, for example timezone and locale data that tends to be identical between distributions. Example:
casync extract http://example.com/myimage.caibx --seed=/dev/sda1 /dev/sda2. This will place the block-layer image described by the indicated URL in the
/dev/sda2partition, using the existing
/dev/sda1data as seeding source. An invocation like this could be typically used by IoT systems with an A/B partition setup. Example 2:
casync extract http://example.com/mycontainer-v3.caidx --seed=/srv/container-v1 --seed=/srv/container-v2 /src/container-v3, is very similar but operates on the file system layer, and uses two old container versions to seed the new version.
When operating on the file system level, the user has fine-grained control on the meta-data included in the serialization. This is relevant since different use-cases tend to require a different set of saved/restored meta-data. For example, when shipping OS images, file access bits/ACLs and ownership matter, while file modification times hurt. When doing personal backups OTOH file ownership matters little but file modification times are important. Moreover different backing file systems support different feature sets, and storing more information than necessary might make it impossible to validate a tree against an image if the meta-data cannot be replayed in full. Due to this,
casyncprovides a set of
--without=parameters that allow fine-grained control of the data stored in the file tree serialization, including the granularity of modification times and more. The precise set of selected meta-data features is also always part of the serialization, so that seeding can work correctly and automatically.
casynctries to be as accurate as possible when storing file system meta-data. This means that besides the usual baseline of file meta-data (file ownership and access bits), and more advanced features (extended attributes, ACLs, file capabilities) a number of more exotic data is stored as well, including Linux chattr(1) file attributes, as well as FAT file attributes (you may wonder why the latter? — EFI is FAT, and
/efiis part of the comprehensive serialization of any host). In the future I intend to extend this further, for example storing
btrfssub-volume information where available. Note that as described above every single type of meta-data may be turned off and on individually, hence if you don't need FAT file bits (and I figure it's pretty likely you don't), then they won't be stored.
The user creating
.caibxfiles may control the desired average chunk length (before compression) freely, using the
--chunk-size=parameter. Smaller chunks increase the number of generated files in the chunk store and increase HTTP GET load on the server, but also ensure that sharing between similar images is improved, as identical patterns in the images stored are more likely to be recognized. By default
casyncwill use a 64K average chunk size. Tweaking this can be particularly useful when adapting the system to specific CDNs, or when delivering compressed disk images such as
Emphasis is placed on making all invocations reproducible, well-defined and strictly deterministic. As mentioned above this is a requirement to reach the intended security guarantees, but is also useful for many other use-cases. For example, the
casync digestcommand may be used to calculate a hash value identifying a specific directory in all desired detail (use
--withoutto pick the desired detail). Moreover the
casync mtreecommand may be used to generate a BSD mtree(5) compatible manifest of a directory tree,
The file system serialization format is nicely composable. By this I mean that the serialization of a file tree is the concatenation of the serializations of all files and file sub-trees located at the top of the tree, with zero meta-data references from any of these serializations into the others. This property is essential to ensure maximum reuse of chunks when similar trees are serialized.
When extracting file trees or disk image files,
casyncwill automatically create reflinks from any specified seeds if the underlying file system supports it (such as
ocfs, and future
xfs). After all, instead of copying the desired data from the seed, we can just tell the file system to link up the relevant blocks. This works both when extracting
.caibxfiles — the latter of course only when the extracted disk image is placed in a regular raw image file on disk, rather than directly on a plain block device, as plain block devices do not know the concept of reflinks.
Optionally, when extracting file trees,
casynccan create traditional UNIX hard-links for identical files in specified seeds (
--hardlink=yes). This works on all UNIX file systems, and can save substantial amounts of disk space. However, this only works for very specific use-cases where disk images are considered read-only after extraction, as any changes made to one tree will propagate to all other trees sharing the same hard-linked files, as that's the nature of hard-links. In this mode,
casyncexposes OSTree-like behavior, which is built heavily around read-only hard-link trees.
casynctries to be smart when choosing what to include in file system images. Implicitly, file systems such as procfs and sysfs are excluded from serialization, as they expose API objects, not real files. Moreover, the "nodump" (
+d) chattr(1) flag is honored by default, permitting users to mark files to exclude from serialization.
When creating and extracting file trees
casyncmay apply an automatic or explicit UID/GID shift. This is particularly useful when transferring container image for use with Linux user name-spacing.
In addition to local operation,
casynccurrently supports HTTP, HTTPS, FTP and ssh natively for downloading chunk index files and chunks (the ssh mode requires installing
casyncon the remote host, though, but an sftp mode not requiring that should be easy to add). When creating index files or chunks, only ssh is supported as remote back-end.
When operating on block-layer images, you may expose locally or remotely stored images as local block devices. Example:
casync mkdev http://example.com/myimage.caibxexposes the disk image described by the indicated URL as local block device in
/dev, which you then may use the usual block device tools on, such as mount or fdisk (only read-only though). Chunks are downloaded on access with high priority, and at low priority when idle in the background. Note that in this mode,
casyncalso plays a role similar to "dm-verity", as all blocks are validated against the strong digests in the chunk index file before passing them on to the kernel's block layer. This feature is implemented though Linux' NBD kernel facility.
Similar, when operating on file-system-layer images, you may mount locally or remotely stored images as regular file systems. Example:
casync mount http://example.com/mytree.caidx /srv/mytreemounts the file tree image described by the indicated URL as a local directory
/srv/mytree. This feature is implemented though Linux' FUSE kernel facility. Note that special care is taken that the images exposed this way can be packed up again with
casync makeand are guaranteed to return the bit-by-bit exact same serialization again that it was mounted from. No data is lost or changed while passing things through FUSE (OK, strictly speaking this is a lie, we do lose ACLs, but that's hopefully just a temporary gap to be fixed soon).
In IoT A/B fixed size partition setups the file systems placed in the two partitions are usually much shorter than the partition size, in order to keep some room for later, larger updates.
casyncis able to analyze the super-block of a number of common file systems in order to determine the actual size of a file system stored on a block device, so that writing a file system to such a partition and reading it back again will result in reproducible data. Moreover this speeds up the seeding process, as there's little point in seeding the white-space after the file system within the partition.
Example Command Lines
Here's how to use
casync, explained with a few examples:
$ casync make foobar.caidx /some/directory
This will create a chunk index file
foobar.caidx in the local
directory, and populate the chunk store directory
located next to it with the chunks of the serialization (you can
change the name for the store directory with
--store= if you
like). This command operates on the file-system level. A similar
command operating on the block level:
$ casync make foobar.caibx /dev/sda1
This command creates a chunk index file
foobar.caibx in the local
directory describing the current contents of the
device, and populates
default.castr in the same way as above. Note
that you may as well read a raw disk image from a file instead of a
$ casync make foobar.caibx myimage.raw
To reconstruct the original file tree from the
.caidx file and
the chunk store of the first command, use:
$ casync extract foobar.caidx /some/other/directory
And similar for the block-layer version:
$ casync extract foobar.caibx /dev/sdb1
or, to extract the block-layer version into a raw disk image:
$ casync extract foobar.caibx myotherimage.raw
The above are the most basic commands, operating on local data only. Now let's make this more interesting, and reference remote resources:
$ casync extract http://example.com/images/foobar.caidx /some/other/directory
This extracts the specified
.caidx onto a local directory. This of
course assumes that
foobar.caidx was uploaded to the HTTP server in
the first place, along with the chunk store. You can use any command
you like to accomplish that, for example
rsync. Alternatively, you can let
casync do this directly when
generating the chunk index:
$ casync make ssh.example.com:images/foobar.caidx /some/directory
This will use ssh to connect to the
ssh.example.com server, and then
.caidx file and the chunks on it. Note that this mode of
operation is "smart": this scheme will only upload chunks currently
missing on the server side, and not re-transmit what already is
Note that you can always configure the precise path or URL of the
chunk store via the
--store= option. If you do not do that, then the
store path is automatically derived from the path or URL: the last
component of the path or URL is replaced by
Of course, when extracting
.caibx files from remote sources,
using a local seed is advisable:
$ casync extract http://example.com/images/foobar.caidx --seed=/some/exising/directory /some/other/directory
Or on the block layer:
$ casync extract http://example.com/images/foobar.caibx --seed=/dev/sda1 /dev/sdb2
When creating chunk indexes on the file system layer
casync will by
default store meta-data as accurately as possible. Let's create a chunk
index with reduced meta-data:
$ casync make foobar.caidx --with=sec-time --with=symlinks --with=read-only /some/dir
This command will create a chunk index for a file tree serialization that has three features above the absolute baseline supported: 1s granularity time-stamps, symbolic links and a single read-only bit. In this mode, all the other meta-data bits are not stored, including nanosecond time-stamps, full UNIX permission bits, file ownership or even ACLs or extended attributes.
Now let's make a
.caidx file available locally as a mounted file
system, without extracting it:
$ casync mount http://example.comf/images/foobar.caidx /mnt/foobar
And similar, let's make a
.caibx file available locally as a block device:
$ casync mkdev http://example.comf/images/foobar.caibx
This will create a block device in
/dev and print the used device
node path to STDOUT.
casync is big about reproducibility. Let's make use of
that to calculate the a digest identifying a very specific version of
a file tree:
$ casync digest .
This digest will include all meta-data bits
casync and the underlying
file system know about. Usually, to make this useful you want to
configure exactly what meta-data to include:
$ casync digest --with=unix .
This makes use of the
--with=unix shortcut for selecting meta-data
--with-unix= selects all meta-data that
traditional UNIX file systems support. It is a shortcut for writing out:
--with=16bit-uids --with=permissions --with=sec-time --with=symlinks
--with=device-nodes --with=fifos --with=sockets.
Note that when calculating digests or creating chunk indexes you may
also use the negative
--without= option to remove specific features
but start from the most precise:
$ casync digest --without=flag-immutable
This generates a digest with the most accurate meta-data, but leaves
one feature out: chattr(1)'s
+i) file flag.
To list the contents of a
.caidx file use a command like the following:
$ casync list http://example.com/images/foobar.caidx
$ casync mtree http://example.com/images/foobar.caidx
The former command will generate a brief list of files and
directories, not too different from
tar t or
ls -al in its
output. The latter command will generate a BSD
manifest. Note that
casync actually stores substantially more file
mtree files can express, though.
What casync isn't
casyncis not an attempt to minimize serialization and downloaded deltas to the extreme. Instead, the tool is supposed to find a good middle ground, that is good on traffic and disk space, but not at the price of convenience or requiring explicit revision control. If you care about updates that are absolutely minimal, there are binary delta systems around that might be an option for you, such as Google's Courgette.
casyncis not a replacement for
zsyncor anything like that. They have very different use-cases and semantics. For example,
rsyncpermits you to directly synchronize two file trees remotely.
casyncjust cannot do that, and it is unlikely it every will.
casync is supposed to be a generic synchronization tool. Its primary
focus for now is delivery of OS images, but I'd like to make it useful
for a couple other use-cases, too. Specifically:
Right now, if you want to deploy
casyncin real-life, you still need to validate the downloaded
.caibxfile yourself, for example with some
gpgsignature. It is my intention to integrate with
gpgin a minimal way so that signing and verifying chunk index files is done automatically.
In the longer run, I'd like to build an automatic synchronizer for
$HOMEbetween systems from this. Each
$HOMEinstance would be stored automatically in regular intervals in the cloud using casync, and conflicts would be resolved locally.
casyncis written in a shared library style, but it is not yet built as one. Specifically this means that almost all of
casync's functionality is supposed to be available as C API soon, and applications can process
casyncfiles on every level. It is my intention to make this library useful enough so that it will be easy to write a module for GNOME's
gvfssubsystem in order to make remote or local
.caidxfiles directly available to applications (as an alternative to
casync mount). In fact the idea is to make this all flexible enough that even the remoting back-ends can be replaced easily, for example to replace
casync's default HTTP/HTTPS back-ends built on CURL with GNOME's own HTTP implementation, in order to share cookies, certificates, … There's also an alternative method to integrate with
casyncin place already: simply invoke
casyncas a sub-process.
casyncwill inform you about a certain set of state changes using a mechanism compatible with sd_notify(3). In future it will also propagate progress data this way and more.
I intend to a add a new seeding back-end that sources chunks from the local network. After downloading the new
.caidxfile off the Internet
casyncwould then search for the listed chunks on the local network first before retrieving them from the Internet. This should speed things up on all installations that have multiple similar systems deployed in the same network.
Further plans are listed tersely in the TODO file.
Is this a systemd project? —
casyncis hosted under the github systemd umbrella, and the projects share the same coding style. However, the code-bases are distinct and without interdependencies, and
casyncworks fine both on systemd systems and systems without it.
casyncportable? — At the moment: no. I only run Linux and that's what I code for. That said, I am open to accepting portability patches (unlike for systemd, which doesn't really make sense on non-Linux systems), as long as they don't interfere too much with the way
casyncworks. Specifically this means that I am not too enthusiastic about merging portability patches for OSes lacking the openat(2) family of APIs.
casyncrequire reflink-capable file systems to work, such as
btrfs? — No it doesn't. The reflink magic in
casyncis employed when the file system permits it, and it's good to have it, but it's not a requirement, and
casyncwill implicitly fall back to copying when it isn't available. Note that
casyncsupports a number of file system features on a variety of file systems that aren't available everywhere, for example FAT's system/hidden file flags or
casyncstable? — I just tagged the first, initial release. While I have been working on it since quite some time and it is quite featureful, this is the first time I advertise it publicly, and it hence received very little testing outside of its own test suite. I am also not fully ready to commit to the stability of the current serialization or chunk index format. I don't see any breakages coming for it though.
casyncis pretty light on documentation right now, and does not even have a man page. I also intend to correct that soon.
.catarfile formats open and documented? —
casyncis Open Source, so if you want to know the precise format, have a look at the sources for now. It's definitely my intention to add comprehensive docs for both formats however. Don't forget this is just the initial version right now.
casyncis just like
$SOMEOTHERTOOL! Why are you reinventing the wheel (again)? — Well, because
casyncisn't "just like" some other tool. I am pretty sure I did my homework, and that there is no tool just like
casyncright now. The tools coming closest are probably
restic, but they are quite different beasts each.
Why did you invent your own serialization format for file trees? Why don't you just use
tar? — That's a good question, and other systems — most prominently
tarsnap— do that. However, as mentioned above
tardoesn't enforce reproducibility. It also doesn't really do random access: if you want to access some specific file you need to read every single byte stored before it in the
tararchive to find it, which is of course very expensive. The serialization
casyncimplements places a focus on reproducibility, random access, and meta-data control. Much like traditional
tarit can still be generated and extracted in a stream fashion though.
casyncsave/restore SELinux/SMACK file labels? — At the moment not. That's not because I wouldn't want it to, but simply because I am not a guru of either of these systems, and didn't want to implement something I do not fully grok nor can test. If you look at the sources you'll find that there's already some definitions in place that keep room for them though. I'd be delighted to accept a patch implementing this fully.
What about delivering
squashfsimages? How well does chunking work on compressed serializations? – That's a very good point! Usually, if you apply the a chunking algorithm to a compressed data stream (let's say a
tar.gzfile), then changing a single bit at the front will propagate into the entire remainder of the file, so that minimal changes will explode into major changes. Thankfully this doesn't apply that strictly to
squashfsimages, as it provides random access to files and directories and thus breaks up the compression streams in regular intervals to make seeking easy. This fact is beneficial for systems employing chunking, such as
casyncas this means single bit changes might affect their vicinity but will not explode in an unbounded fashion. In order achieve best results when delivering
casyncthe block sizes of
squashfsand the chunks sizes of
casyncshould be matched up (using
--chunk-size=option). How precisely to choose both values is left a research subject for the user, for now.
What does the name
casyncmean? – It's a synchronizing tool, hence the
rsync's naming. It makes use of the content-addressable concept of
Where can I get this stuff? Is it already packaged? – Check out the sources on GitHub. I just tagged the first version. Martin Pitt has packaged
casyncfor Ubuntu. There is also an ArchLinux package. Zbigniew Jędrzejewski-Szmek has prepared a Fedora RPM that hopefully will soon be included in the distribution.
Should you care? Is this a tool for you?
Well, that's up to you really. If you are involved with projects that need to deliver IoT, VM, container, application or OS images, then maybe this is a great tool for you — but other options exist, some of which are linked above.
casync is an Open Source project: if it doesn't do exactly
what you need, prepare a patch that adds what you need, and we'll
If you are interested in the project and would like to talk about this
in person, I'll be presenting
casync soon at Kinvolk's Linux
in Berlin, Germany. You are invited. I also intend to talk about it at
All Systems Go!, also in Berlin.