On the Brokenness of File Locking

It's amazing how far Linux has come without providing for proper file locking that works and is usable from userspace. A little overview why file locking is still in a very sad state:

To begin with, there's a plethora of APIs, and all of them are awful:

POSIX File locking as available with fcntl(F_SET_LK): the POSIX locking API is the most portable one and in theory works across NFS. It can do byte-range locking. So much on the good side. On the bad side there's a lot more however: locks are bound to processes, not file descriptors. That means that this logic cannot be used in threaded environments unless combined with a process-local mutex. This is hard to get right, especially in libraries that do not know the environment they are run in, i.e. whether they are used in threaded environments or not. The worst part however is that POSIX locks are automatically released if a process calls close() on any (!) of its open file descriptors for that file. That means that when one part of a program locks a file and another by coincidence accesses it too for a short time, the first part's lock will be broken and it won't be notified about that. Modern software tends to load big frameworks (such as Gtk+ or Qt) into memory as well as arbitrary modules via mechanisms such as NSS, PAM, gvfs, GTK_MODULES, Apache modules, GStreamer modules where one module seldom can control what another module in the same process does or accesses. The effect of this is that POSIX locks are unusable in any non-trivial program where it cannot be ensured that a file that is locked is never accessed by any other part of the process at the same time. Example: a user managing daemon wants to write /etc/passwd and locks the file for that. At the same time in another thread (or from a stack frame further down) something calls getpwuid() which internally accesses /etc/passwd and causes the lock to be released, the first thread (or stack frame) not knowing that. Furthermore should two threads use the locking fcntl()s on the same file they will interfere with each other's locks and reset the locking ranges and flags of each other. On top of that locking cannot be used on any file that is publicly accessible (i.e. has the R bit set for groups/others, i.e. more access bits on than 0600), because that would otherwise effectively give arbitrary users a way to indefinitely block execution of any process (regardless of the UID it is running under) that wants to access and lock the file. This is generally not an acceptable security risk. Finally, while POSIX file locks are supposedly NFS-safe they not always really are as there are still many NFS implementations around where locking is not properly implemented, and NFS tends to be used in heterogenous networks. The biggest problem about this is that there is no way to properly detect whether file locking works on a specific NFS mount (or any mount) or not.
The other API for POSIX file locks: lockf() is another API for the same mechanism and suffers by the same problems. One wonders why there are two APIs for the same messed up interface.
BSD locking based on flock(). The semantics of this kind of locking are much nicer than for POSIX locking: locks are bound to file descriptors, not processes. This kind of locking can hence be used safely between threads and can even be inherited across fork() and exec(). Locks are only automatically broken on the close() call for the one file descriptor they were created with (or the last duplicate of it). On the other hand this kind of locking does not offer byte-range locking and suffers by the same security problems as POSIX locking, and works on even less cases on NFS than POSIX locking (i.e. on BSD and Linux < 2.6.12 they were NOPs returning success). And since BSD locking is not as portable as POSIX locking this is sometimes an unsafe choice. Some OSes even find it funny to make flock() and fcntl(F_SET_LK) control the same locks. Linux treats them independently -- except for the cases where it doesn't: on Linux NFS they are transparently converted to POSIX locks, too now. What a chaos!
Mandatory locking is available too. It's based on the POSIX locking API but not portable in itself. It's dangerous business and should generally be avoided in cleanly written software.
Traditional lock file based file locking. This is how things where done traditionally, based around known atomicity guarantees of certain basic file system operations. It's a cumbersome thing, and requires polling of the file system to get notifications when a lock is released. Also, On Linux NFS < 2.6.5 it doesn't work properly, since O_EXCL isn't atomic there. And of course the client cannot really know what the server is running, so again this brokeness is not detectable.

The Disappointing Summary

File locking on Linux is just broken. The broken semantics of POSIX locking show that the designers of this API apparently never have tried to actually use it in real software. It smells a lot like an interface that kernel people thought makes sense but in reality doesn't when you try to use it from userspace.

Here's a list of places where you shouldn't use file locking due to the problems shown above: If you want to lock a file in $HOME, forget about it as $HOME might be NFS and locks generally are not reliable there. The same applies to every other file system that might be shared across the network. If the file you want to lock is accessible to more than your own user (i.e. an access mode > 0700), forget about locking, it would allow others to block your application indefinitely. If your program is non-trivial or threaded or uses a framework such as Gtk+ or Qt or any of the module-based APIs such as NSS, PAM, ... forget about about POSIX locking. If you care about portability, don't use file locking.

Or to turn this around, the only case where it is kind of safe to use file locking is in trivial applications where portability is not key and by using BSD locking on a file system where you can rely that it is local and on files inaccessible to others. Of course, that doesn't leave much, except for private files in /tmp for trivial user applications.

Or in one sentence: in its current state Linux file locking is unusable.

And that is a shame.

Update: Check out the follow-up story on this topic.

Category: projects