Posted on Do 19 November 2015

Introducing sd-event

The Event Loop API of libsystemd

When we began working on systemd we built it around a hand-written ad-hoc event loop, wrapping Linux epoll. The more our project grew the more we realized the limitations of using raw epoll:

  • As we used timerfd for our timer events, each event source cost one file descriptor and we had many of them! File descriptors are a scarce resource on UNIX, as RLIMIT_NOFILE is typically set to 1024 or similar, limiting the number of available file descriptors per process to 1021, which isn't particularly a lot.

  • Ordering of event dispatching became a nightmare. In many cases, we wanted to make sure that a certain kind of event would always be dispatched before another kind of event, if both happen at the same time. For example, when the last process of a service dies, we might be notified about that via a SIGCHLD signal, via an sd_notify() "STATUS=" message, and via a control group notification. We wanted to get these events in the right order, to know when it's safe to process and subsequently release the runtime data systemd keeps about the service or process: it shouldn't be done if there are still events about it pending.

  • For each program we added to the systemd project we noticed we were adding similar code, over and over again, to work with epoll's complex interfaces. For example, finding the right file descriptor and callback function to dispatch an epoll event to, without running into invalidated pointer issues is outright difficult and requires non-trivial code.

  • Integrating child process watching into our event loops was much more complex than one could hope, and even more so if child process events should be ordered against each other and unrelated kinds of events.

Eventually, we started working on sd-bus. At the same time we decided to seize the opportunity, put together a proper event loop API in C, and then not only port sd-bus on top of it, but also the rest of systemd. The result of this is sd-event. After almost two years of development we declared sd-event stable in systemd version 221, and published it as official API of libsystemd.

Why?

sd-event.h, of course, is not the first event loop API around, and it doesn't implement any really novel concepts. When we started working on it we tried to do our homework, and checked the various existing event loop APIs, maybe looking for candidates to adopt instead of doing our own, and to learn about the strengths and weaknesses of the various implementations existing. Ultimately, we found no implementation that could deliver what we needed, or where it would be easy to add the missing bits: as usual in the systemd project, we wanted something that allows us access to all the Linux-specific bits, instead of limiting itself to the least common denominator of UNIX. We weren't looking for an abstraction API, but simply one that makes epoll usable in system code.

With this blog story I'd like to take the opportunity to introduce you to sd-event, and explain why it might be a good candidate to adopt as event loop implementation in your project, too.

So, here are some features it provides:

  • I/O event sources, based on epoll's file descriptor watching, including edge triggered events (EPOLLET). See sd_event_add_io(3).

  • Timer event sources, based on timerfd_create(), supporting the CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTIME clocks, as well as the CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM clocks that can resume the system from suspend. When creating timer events a required accuracy parameter may be specified which allows coalescing of timer events to minimize power consumption. For each clock only a single timer file descriptor is kept, and all timer events are multiplexed with a priority queue. See sd_event_add_time(3).

  • UNIX process signal events, based on signalfd(2), including full support for real-time signals, and queued parameters. See sd_event_add_signal(3).

  • Child process state change events, based on waitid(2). See sd_event_add_child(3).

  • Static event sources, of three types: defer, post and exit, for invoking calls in each event loop, after other event sources or at event loop termination. See sd_event_add_defer(3).

  • Event sources may be assigned a 64bit priority value, that controls the order in which event sources are dispatched if multiple are pending simultanously. See sd_event_source_set_priority(3).

  • The event loop may automatically send watchdog notification messages to the service manager. See sd_event_set_watchdog(3).

  • The event loop may be integrated into foreign event loops, such as the GLib one. The event loop API is hence composable, the same way the underlying epoll logic is. See sd_event_get_fd(3) for an example.

  • The API is fully OOM safe.

  • A complete set of documentation in UNIX man page format is available, with sd-event(3) as the entry page.

  • It's pretty widely available, and requires no extra dependencies. Since systemd is built on it, most major distributions ship the library in their default install set.

  • After two years of development, and after being used in all of systemd's components, it has received a fair share of testing already, even though we only recently decided to declare it stable and turned it into a public API.

Note that sd-event has some potential drawbacks too:

  • If portability is essential to you, sd-event is not your best option. sd-event is a wrapper around Linux-specific APIs, and that's visible in the API. For example: our event callbacks receive structures defined by Linux-specific APIs such as signalfd.

  • It's a low-level C API, and it doesn't isolate you from the OS underpinnings. While I like to think that it is relatively nice and easy to use from C, it doesn't compromise on exposing the low-level functionality. It just fills the gaps in what's missing between epoll, timerfd, signalfd and related concepts, and it does not hide that away.

Either way, I believe that sd-event is a great choice when looking for an event loop API, in particular if you work on system-level software and embedded, where functionality like timer coalescing or watchdog support matter.

Getting Started

Here's a short example how to use sd-event in a simple daemon. In this example, we'll not just use sd-event.h, but also sd-daemon.h to implement a system service.

#include <alloca.h>
#include <endian.h>
#include <errno.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <unistd.h>

#include <systemd/sd-daemon.h>
#include <systemd/sd-event.h>

static int io_handler(sd_event_source *es, int fd, uint32_t revents, void *userdata) {
        void *buffer;
        ssize_t n;
        int sz;

        /* UDP enforces a somewhat reasonable maximum datagram size of 64K, we can just allocate the buffer on the stack */
        if (ioctl(fd, FIONREAD, &sz) < 0)
                return -errno;
        buffer = alloca(sz);

        n = recv(fd, buffer, sz, 0);
        if (n < 0) {
                if (errno == EAGAIN)
                        return 0;

                return -errno;
        }

        if (n == 5 && memcmp(buffer, "EXIT\n", 5) == 0) {
                /* Request a clean exit */
                sd_event_exit(sd_event_source_get_event(es), 0);
                return 0;
        }

        fwrite(buffer, 1, n, stdout);
        fflush(stdout);
        return 0;
}

int main(int argc, char *argv[]) {
        union {
                struct sockaddr_in in;
                struct sockaddr sa;
        } sa;
        sd_event_source *event_source = NULL;
        sd_event *event = NULL;
        int fd = -1, r;
        sigset_t ss;

        r = sd_event_default(&event);
        if (r < 0)
                goto finish;

        if (sigemptyset(&ss) < 0 ||
            sigaddset(&ss, SIGTERM) < 0 ||
            sigaddset(&ss, SIGINT) < 0) {
                r = -errno;
                goto finish;
        }

        /* Block SIGTERM first, so that the event loop can handle it */
        if (sigprocmask(SIG_BLOCK, &ss, NULL) < 0) {
                r = -errno;
                goto finish;
        }

        /* Let's make use of the default handler and "floating" reference features of sd_event_add_signal() */
        r = sd_event_add_signal(event, NULL, SIGTERM, NULL, NULL);
        if (r < 0)
                goto finish;
        r = sd_event_add_signal(event, NULL, SIGINT, NULL, NULL);
        if (r < 0)
                goto finish;

        /* Enable automatic service watchdog support */
        r = sd_event_set_watchdog(event, true);
        if (r < 0)
                goto finish;

        fd = socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0);
        if (fd < 0) {
                r = -errno;
                goto finish;
        }

        sa.in = (struct sockaddr_in) {
                .sin_family = AF_INET,
                .sin_port = htobe16(7777),
        };
        if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
                r = -errno;
                goto finish;
        }

        r = sd_event_add_io(event, &event_source, fd, EPOLLIN, io_handler, NULL);
        if (r < 0)
                goto finish;

        (void) sd_notifyf(false,
                          "READY=1\n"
                          "STATUS=Daemon startup completed, processing events.");

        r = sd_event_loop(event);

finish:
        event_source = sd_event_source_unref(event_source);
        event = sd_event_unref(event);

        if (fd >= 0)
                (void) close(fd);

        if (r < 0)
                fprintf(stderr, "Failure: %s\n", strerror(-r));

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

The example above shows how to write a minimal UDP/IP server, that listens on port 7777. Whenever a datagram is received it outputs its contents to STDOUT, unless it is precisely the string EXIT\n in which case the service exits. The service will react to SIGTERM and SIGINT and do a clean exit then. It also notifies the service manager about its completed startup, if it runs under a service manager. Finally, it sends watchdog keep-alive messages to the service manager if it asked for that, and if it runs under a service manager.

When run as systemd service this service's STDOUT will be connected to the logging framework of course, which means the service can act as a minimal UDP-based remote logging service.

To compile and link this example, save it as event-example.c, then run:

$ gcc event-example.c -o event-example `pkg-config --cflags --libs libsystemd`

For a first test, simply run the resulting binary from the command line, and test it against the following netcat command line:

$ nc -u localhost 7777

For the sake of brevity error checking is minimal, and in a real-world application should, of course, be more comprehensive. However, it hopefully gets the idea across how to write a daemon that reacts to external events with sd-event.

For further details on the functions used in the example above, please consult the manual pages: sd-event(3), sd_event_exit(3), sd_event_source_get_event(3), sd_event_default(3), sd_event_add_signal(3), sd_event_set_watchdog(3), sd_event_add_io(3), sd_notifyf(3), sd_event_loop(3), sd_event_source_unref(3), sd_event_unref(3).

Conclusion

So, is this the event loop to end all other event loops? Certainly not. I actually believe in "event loop plurality". There are many reasons for that, but most importantly: sd-event is supposed to be an event loop suitable for writing a wide range of applications, but it's definitely not going to solve all event loop problems. For example, while the priority logic is important for many usecase it comes with drawbacks for others: if not used carefully high-priority event sources can easily starve low-priority event sources. Also, in order to implement the priority logic, sd-event needs to linearly iterate through the event structures returned by epoll_wait(2) to sort the events by their priority, resulting in worst case O(n*log(n)) complexity on each event loop wakeup (for n = number of file descriptors). Then, to implement priorities fully, sd-event only dispatches a single event before going back to the kernel and asking for new events. sd-event will hence not provide the theoretically possible best scalability to huge numbers of file descriptors. Of course, this could be optimized, by improving epoll, and making it support how todays's event loops actually work (after, all, this is the problem set all event loops that implement priorities -- including GLib's -- have to deal with), but even then: the design of sd-event is focussed on running one event loop per thread, and it dispatches events strictly ordered. In many other important usecases a very different design is preferable: one where events are distributed to a set of worker threads and are dispatched out-of-order.

Hence, don't mistake sd-event for what it isn't. It's not supposed to unify everybody on a single event loop. It's just supposed to be a very good implementation of an event loop suitable for a large part of the typical usecases.

Note that our APIs, including sd-bus, integrate nicely into sd-event event loops, but do not require it, and may be integrated into other event loops too, as long as they support watching for time and I/O events.

And that's all for now. If you are considering using sd-event for your project and need help or have questions, please direct them to the systemd mailing list.

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .