Copyright 2003-2006 Lennart Poettering <mzflerc (at) 0pointer (dot) de>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Version 0.9 released; Changes include: ported to Win32/Cygwin (patches from Bohdan Futerko); compatibility with Berkeley DB 4.4
Version 0.8 released; Changes include: when resuming a canceled merge operation, try harder not to loose any files; when merging and a file already exists, and the user chooses not to replace it, don't treat this as an error; when updating an existing snapshot locally which originates from a different host, update the origin field
Version 0.7 released; Changes include: fix trash cleanup; fix long standing file copying bug; don't save device info data in the MD cache any longer, use the new option --check-dev to reenable this feature; update to Berkeley DB 4.3
Version 0.6 released; Changes include: fix an ugly bug which made snapshots where --forget was used unusable; add debian/ directory for easily making Debian packages
Version 0.5 released; Changes include: optionally show sizes of file on --diff, implement new command --forget, check for extended attribute user.syrep on --update on file systems that support it.
Version 0.4 released; Changes include: fix annonoying SIGBUS failure when working on files >= 100 MB, update to Berkeley DB 4.2, use madvise() to improve file copying throughput on newer kernels, minor other fixes
Version 0.3 released; Changes include: new options --sort, --check-md, --always-copy; implemented direct bi-directory merges, documentation updates, build system updates, assorted fixes.
Version 0.2 released; Fixes include: documentation update, --diff output improved, --merge output fixed.
Version 0.1 released.
syrep is a generic file repository synchronization tool. It may be used to synchronize large file hierarchies bidirectionally by exchanging patch files. Syrep is truely peer-to-peer, no central servers are involved. Synchronizations between more than two repositories are supported. The patch files may be transferred via offline media, e.g. removable hard disks or compact discs.
Files are tracked by their message digests, currently MD5. The following file operations are tracked in the snapshot files: creation, deletion, modification, creation of new hard or symbolic links, renaming. (The latter is nothing more than a new hard link and removal of the old file). syrep doesn't distuinguish between soft and hard links. In fact even copies of files are treated as the same. Currently, syrep doesn't synchronize file attributes like access modes or modification times.
syrep was written to facilitate the synchronization of two large digital music repositories without direct network connection. Patch files of several gigabytes are common in this situation.
syrep is able to cope with 64 bit file sizes. (LFS)
syrep is optimized for speed. It may make use of a message digest cache to accelerate the calculation of digests of a whole directory hierarchy.
syrep is kind of a bidirectional rsync, but stores and makes use of a file hierarchy history. Synchronization with syrep is based on patch files, and doesn't require a direct connection between the synchronizing peers.
syrep has many things in common with version control systems like CVS or SVN: it stores a history and has operations similar to update and commit. However: the history doesn't contain file contents and is not line based, it stores the MD5 digest and some meta data only. There is no central server, instead all peers have the same role. There is no distinction between repositories and checkouts. In fact checkout and repository are identical.
syrep has even more things in common with arch/tla and BitKeeper. All three are patch based and are "peer-to-peer". However, there are certain differences: syrep doesn't differentiate between repositories and checkouts. syrep doesn't keep a file contents history of any kind.
syrep resembles diff/patch or xdelta2 in some way. While the latter work on file contents, syrep works on file hierarchies.
In contrast two most of the software mentioned above, syrep is capable of synchronizing repositories of several 100GB of size, with only a very small overhead. (i.e. 4 MB of control data for half a year history for 100 GB of user data)
Version 0.9 is more or less stable and fulfills its purpose.
Have a look on the man page syrep(1). (A XSLT capable browser is required)
Syrep's operation relies on "snapshots" of a file repository. A snapshot contains information about all files existent in the hierarchy combined with a limited history log of file operations. Snapshots may be compared, files missing or deleted on one of both sides may be detected this way. Based on this knowledge patch files containing all missing files may be created and merged.
To keep the file operation log in a sensible state it is crucial to update the snapshot frequently, probably by adding a new cron job.
Fred and Karl want to synchronize their digital music libraries by exchanging an USB hard disk with patch files. As first step, both initialize their repositories for usage with syrep:
fred$ syrep -zp --update ~/mp3/ ... karl$ syrep -zp --update ~/mp3/
Depending on the size of the repositories this takes a lot of time, since a message digest is calculated for every file. Since exact tracking of all file operations on the repository is crucial for effetive synchronization, they both use the time passing to add a new entry to their crontab:
1 3 * * * syrep -z --update ~/mp3/
When the snapshot creation finished, they send the newly created patch files ~/mp3/.syrep/curent.syrep to each other. As these snapshots are only about 400K of size for a 80GB repository they do that via email:
fred$ mutt -a ~/mp3/.syrep/current.syrep -s "The current snapshot of fred" karl ... karl$ mutt -a ~/mp3/.syrep/current.syrep -s "The current snapshot of karl" fred
When the mails arrive they both detach the snapshot and create a patch on their USB harddisk containing all local files not existing on the other siede:
fred$ mount /mnt/usb fred$ syrep -p -o /mnt/usb/patch-for-karl --makepatch ~/mp3/ ~/karls-current.syrep fred$ umount /mnt/usb ... karl$ mount /mnt/usb karl$ syrep -p -o /mnt/usb/patch-for-fred --makepatch ~/mp3/ ~/freds-current.syrep karl$ umount /mnt/usb
As next step they exchange their harddisks. Back at home they merge the newly acquired patch into their own repository:
fred$ mount /mnt/usb fred$ syrep -pT --merge /mnt/usb/patch-for-fred ~/mp3/ fred$ umount /mnt/usb ... karl$ mount /mnt/usb karl$ syrep -pT --merge /mnt/usb/patch-for-karl ~/mp3/ karl$ umount /mnt/usb
At this moment both have the same file hierarchy. To update the local snapshot log with the newly merged files they both should run --update now. This update run should be much quicker since the message digests of all unchanged files are read from a message cache created and update each time --update runs:
fred$ syrep -zp --update ~/mp3/ ... karl$ syrep -zp --update ~/mp3/
Some time later Fred got plenty of new music files, while Karl didn't change anything on his repository. Thus, Fred is able to use the old snapshot he recieved from Karl to generate a new patch for him. He does it exactly the same way he did the last time, see above.
And now, several iterations of the story described above follow.
That's the end of the story.
OK, not quite. Sometimes a conflict happens, e.g. at the same time both created a file foo.mp3 with different contents. When this happens the local copy is always copied into the patch and the user may decide during merge which file version he wants to have locally. Because of that merging is an interactive task and cannot be automated completely.
There is no need that the synchronization operations happen in such a "symmetric" way as described above.
syrep requires installed development versions of zlib and Berkeley DB 4.3. If you want build syrep with support for extended attributes (currently supported on Linux only) you have to install libattr and a kernel that supports it.
syrep was developed and tested on Debian GNU/Linux "testing" from September 2003, it should work on most other Linux distributions and may be POSIX implementations since it uses GNU autoconf for source code configuration.
Some support for for big endian architectures is included, however, it is incomplete. You're welcome to send me patches.
If the syrep build system detects Oliver Kurth's xmltoman the man page is rebuilt. Otherwise the pre-compiled versions shipped with syrep are used.
As this package is made with the GNU autotools you should run ./configure inside the distribution directory for configuring the source tree. After that you should run make for compilation and make install (as root) for installation of syrep.
This software includes an implementation of the MD5 algorithm by L. Peter Deutsch. Thanks to him for this.
Bohdan Futerko for porting syrep to Win32/Cygwin.
The newest release is always available from http://0pointer.de/lennart/projects/syrep/
The current release is 0.9.
Win32/x86 binaries courtesy of Bohdan Futerko
Get syrep's development sources from the Subversion repository (viewcvs):
svn checkout svn://svn.0pointer.net/syrep/trunk syrep
You may find an up-to-date Debian package (i386) of syrep in my local Debian package repository.
If you want to be notified whenever I release a new version of this software use the subscription feature of Freshmeat.