The Event Loop API of libsystemd
When we began working on
systemd we built
it around a hand-written ad-hoc event loop, wrapping Linux
epoll. The more
our project grew the more we realized the limitations of using raw
epoll:
-
As we used
timerfd
for our timer events, each event source cost one file descriptor and
we had many of them! File descriptors are a scarce resource on UNIX,
as
RLIMIT_NOFILE
is typically set to 1024 or similar, limiting the number of
available file descriptors per process to 1021, which isn't
particularly a lot.
-
Ordering of event dispatching became a nightmare. In many cases, we
wanted to make sure that a certain kind of event would always be
dispatched before another kind of event, if both happen at the same
time. For example, when the last process of a service dies, we might
be notified about that via a SIGCHLD signal, via an
sd_notify() "STATUS="
message, and via a control group notification. We wanted to get
these events in the right order, to know when it's safe to process
and subsequently release the runtime data systemd keeps about the
service or process: it shouldn't be done if there are still events
about it pending.
-
For each program we added to the systemd project we noticed we were
adding similar code, over and over again, to work with epoll's
complex interfaces. For example, finding the right file descriptor
and callback function to dispatch an epoll event to, without running
into invalidated pointer issues is outright difficult and requires
non-trivial code.
-
Integrating child process watching into our event loops was much
more complex than one could hope, and even more so if child process
events should be ordered against each other and unrelated kinds of
events.
Eventually, we started working on
sd-bus. At
the same time we decided to seize the opportunity, put together a
proper event loop API in C, and then not only port sd-bus on top of
it, but also the rest of systemd. The result of this is
sd-event. After
almost two years of development we declared sd-event stable in systemd
version 221, and published it as official API of libsystemd.
Why?
sd-event.h,
of course, is not the first event loop API around, and it doesn't
implement any really novel concepts. When we started working on it we
tried to do our homework, and checked the various existing event loop
APIs, maybe looking for candidates to adopt instead of doing our own,
and to learn about the strengths and weaknesses of the various
implementations existing. Ultimately, we found no implementation that
could deliver what we needed, or where it would be easy to add the
missing bits: as usual in the systemd project, we wanted something
that allows us access to all the Linux-specific bits, instead of
limiting itself to the least common denominator of UNIX. We weren't
looking for an abstraction API, but simply one that makes epoll usable
in system code.
With this blog story I'd like to take the opportunity to introduce you
to sd-event, and explain why it might be a good candidate to adopt as
event loop implementation in your project, too.
So, here are some features it provides:
-
I/O event sources, based on epoll's file descriptor watching,
including edge triggered events (EPOLLET). See
sd_event_add_io(3).
-
Timer event sources, based on timerfd_create()
, supporting the
CLOCK_MONOTONIC
, CLOCK_REALTIME
, CLOCK_BOOTIME
clocks, as well
as the CLOCK_REALTIME_ALARM
and CLOCK_BOOTTIME_ALARM
clocks that
can resume the system from suspend. When creating timer events a
required accuracy parameter may be specified which allows coalescing
of timer events to minimize power consumption. For each clock only a
single timer file descriptor is kept, and all timer events are
multiplexed with a priority queue. See
sd_event_add_time(3).
-
UNIX process signal events, based on
signalfd(2),
including full support for real-time signals, and queued
parameters. See sd_event_add_signal(3).
-
Child process state change events, based on
waitid(2). See
sd_event_add_child(3).
-
Static event sources, of three types: defer, post and exit, for
invoking calls in each event loop, after other event sources or at
event loop termination. See
sd_event_add_defer(3).
-
Event sources may be assigned a 64bit priority value, that controls
the order in which event sources are dispatched if multiple are
pending simultanously. See
sd_event_source_set_priority(3).
-
The event loop may automatically send watchdog notification messages
to the service manager. See sd_event_set_watchdog(3).
-
The event loop may be integrated into foreign event loops, such as
the GLib one. The event loop API is hence composable, the same way
the underlying epoll logic is. See
sd_event_get_fd(3)
for an example.
-
The API is fully OOM safe.
-
A complete set of documentation in UNIX man page format is
available, with
sd-event(3)
as the entry page.
-
It's pretty widely available, and requires no extra
dependencies. Since systemd is built on it, most major distributions
ship the library in their default install set.
-
After two years of development, and after being used in all of
systemd's components, it has received a fair share of testing already,
even though we only recently decided to declare it stable and turned
it into a public API.
Note that sd-event has some potential drawbacks too:
-
If portability is essential to you, sd-event is not your best
option. sd-event is a wrapper around Linux-specific APIs, and that's
visible in the API. For example: our event callbacks receive
structures defined by Linux-specific APIs such as signalfd.
-
It's a low-level C API, and it doesn't isolate you from the OS
underpinnings. While I like to think that it is relatively nice and
easy to use from C, it doesn't compromise on exposing the low-level
functionality. It just fills the gaps in what's missing between
epoll, timerfd, signalfd and related concepts, and it does not hide
that away.
Either way, I believe that sd-event is a great choice when looking for
an event loop API, in particular if you work on system-level software
and embedded, where functionality like timer coalescing or
watchdog support matter.
Getting Started
Here's a short example how to use sd-event in a simple daemon. In this
example, we'll not just use sd-event.h, but also sd-daemon.h to
implement a system service.
#include <alloca.h>
#include <endian.h>
#include <errno.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <unistd.h>
#include <systemd/sd-daemon.h>
#include <systemd/sd-event.h>
static int io_handler(sd_event_source *es, int fd, uint32_t revents, void *userdata) {
void *buffer;
ssize_t n;
int sz;
/* UDP enforces a somewhat reasonable maximum datagram size of 64K, we can just allocate the buffer on the stack */
if (ioctl(fd, FIONREAD, &sz) < 0)
return -errno;
buffer = alloca(sz);
n = recv(fd, buffer, sz, 0);
if (n < 0) {
if (errno == EAGAIN)
return 0;
return -errno;
}
if (n == 5 && memcmp(buffer, "EXIT\n", 5) == 0) {
/* Request a clean exit */
sd_event_exit(sd_event_source_get_event(es), 0);
return 0;
}
fwrite(buffer, 1, n, stdout);
fflush(stdout);
return 0;
}
int main(int argc, char *argv[]) {
union {
struct sockaddr_in in;
struct sockaddr sa;
} sa;
sd_event_source *event_source = NULL;
sd_event *event = NULL;
int fd = -1, r;
sigset_t ss;
r = sd_event_default(&event);
if (r < 0)
goto finish;
if (sigemptyset(&ss) < 0 ||
sigaddset(&ss, SIGTERM) < 0 ||
sigaddset(&ss, SIGINT) < 0) {
r = -errno;
goto finish;
}
/* Block SIGTERM first, so that the event loop can handle it */
if (sigprocmask(SIG_BLOCK, &ss, NULL) < 0) {
r = -errno;
goto finish;
}
/* Let's make use of the default handler and "floating" reference features of sd_event_add_signal() */
r = sd_event_add_signal(event, NULL, SIGTERM, NULL, NULL);
if (r < 0)
goto finish;
r = sd_event_add_signal(event, NULL, SIGINT, NULL, NULL);
if (r < 0)
goto finish;
/* Enable automatic service watchdog support */
r = sd_event_set_watchdog(event, true);
if (r < 0)
goto finish;
fd = socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0);
if (fd < 0) {
r = -errno;
goto finish;
}
sa.in = (struct sockaddr_in) {
.sin_family = AF_INET,
.sin_port = htobe16(7777),
};
if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
r = -errno;
goto finish;
}
r = sd_event_add_io(event, &event_source, fd, EPOLLIN, io_handler, NULL);
if (r < 0)
goto finish;
(void) sd_notifyf(false,
"READY=1\n"
"STATUS=Daemon startup completed, processing events.");
r = sd_event_loop(event);
finish:
event_source = sd_event_source_unref(event_source);
event = sd_event_unref(event);
if (fd >= 0)
(void) close(fd);
if (r < 0)
fprintf(stderr, "Failure: %s\n", strerror(-r));
return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}
The example above shows how to write a minimal UDP/IP server, that
listens on port 7777. Whenever a datagram is received it outputs its
contents to STDOUT, unless it is precisely the string EXIT\n
in
which case the service exits. The service will react to SIGTERM and
SIGINT and do a clean exit then. It also notifies the service manager
about its completed startup, if it runs under a service
manager. Finally, it sends watchdog keep-alive messages to the service
manager if it asked for that, and if it runs under a service manager.
When run as systemd service this service's STDOUT will be connected to
the logging framework of course, which means the service can act as a
minimal UDP-based remote logging service.
To compile and link this example, save it as event-example.c
, then run:
$ gcc event-example.c -o event-example `pkg-config --cflags --libs libsystemd`
For a first test, simply run the resulting binary from the command
line, and test it against the following netcat command line:
For the sake of brevity error checking is minimal, and in a real-world
application should, of course, be more comprehensive. However, it
hopefully gets the idea across how to write a daemon that reacts to
external events with sd-event.
For further details on the functions used in the example above, please
consult the manual pages:
sd-event(3),
sd_event_exit(3),
sd_event_source_get_event(3),
sd_event_default(3),
sd_event_add_signal(3),
sd_event_set_watchdog(3),
sd_event_add_io(3),
sd_notifyf(3),
sd_event_loop(3),
sd_event_source_unref(3),
sd_event_unref(3).
Conclusion
So, is this the event loop to end all other event loops? Certainly
not. I actually believe in "event loop plurality". There are many
reasons for that, but most importantly: sd-event is supposed to be an
event loop suitable for writing a wide range of applications, but it's
definitely not going to solve all event loop problems. For example,
while the priority logic is important for many usecase it comes with
drawbacks for others: if not used carefully high-priority event
sources can easily starve low-priority event sources. Also, in order
to implement the priority logic, sd-event needs to linearly iterate
through the event structures returned by
epoll_wait(2)
to sort the events by their priority, resulting in worst case
O(n*log(n)) complexity on each event loop wakeup (for n = number of
file descriptors). Then, to implement priorities fully, sd-event only
dispatches a single event before going back to the kernel and asking
for new events. sd-event will hence not provide the theoretically
possible best scalability to huge numbers of file descriptors. Of
course, this could be optimized, by improving epoll, and making it
support how todays's event loops actually work (after, all, this is
the problem set all event loops that implement priorities -- including
GLib's -- have to deal with), but even then: the design of sd-event is focussed on
running one event loop per thread, and it dispatches events strictly
ordered. In many other important usecases a very different design is
preferable: one where events are distributed to a set of worker threads
and are dispatched out-of-order.
Hence, don't mistake sd-event for what it isn't. It's not supposed to
unify everybody on a single event loop. It's just supposed to be a
very good implementation of an event loop suitable for a large part of
the typical usecases.
Note that our APIs, including
sd-bus, integrate nicely into
sd-event event loops, but do not require it, and may be integrated
into other event loops too, as long as they support watching for time
and I/O events.
And that's all for now. If you are considering using sd-event for your
project and need help or have questions, please direct them to the
systemd mailing list.