The Paper Committee of the Desktop Summit 2011, in Berlin, Germany is happy to announce that the conference programme is now published.
And yes, it is an absolutely rocking programme.
See you in Berlin!
The Paper Committee of the Desktop Summit 2011, in Berlin, Germany is happy to announce that the conference programme is now published.
And yes, it is an absolutely rocking programme.
See you in Berlin!
The Linux Plumbers Conference 2011 in Santa Rosa, CA, USA is coming nearer (Sep. 7-9). Together with Kay Sievers I am running the Boot&Init track, and together with Mark Brown the Audio track.
For both tracks we still need proposals. So if you haven't submitted anything yet, please consider doing so. And that quickly. i.e. if you can arrange for it, last sunday would be best, since that was actually the final deadline. However, the submission form is still open, so if you submit something really, really quickly we'll ignore the absence of time travel and the calendar for a bit. So, go, submit something. Now.
What are we looking for? Well, here's what I just posted on the audio related mailing lists:
So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything audio plumbing related: audio drivers, audio APIs, sound servers, pro audio, consumer audio. If you can propose something audio related -- like talks on media controller routing, on audio for ASOC/Embedded, submit something! If you care for low-latency audio, submit something. If you care about the Linux audio stack in general, submit something. LPC is probably the most relevant technical conference on the general Linux platform, so be sure that if you want your project, your work, your ideas to be heard then this is the right forum for everything related to the Linux stack. And the Audio track covers everything in our Audio Stack, regardless whether it is pro or consumer audio.
And here's what I posted to the init related lists:
So, please consider submitting something if you haven't done so yet. We are looking for all kinds of technical talks covering everything from the BIOS (i.e. CoreBoot and friends), over boot loaders (i.e. GRUB and friends), to initramfs (i.e. Dracut and friends) and init systems (i.e. systemd and friends). If you have something smart to say about any of these areas or maybe about related tools (i.e. you wrote a fancy new tool to measure boot performance) or fancy boot schemes in your favourite Linux based OS (i.e. the new Meego zero second boot ;-)) then don't hesitate to submit something on the LPC web site, in the Boot&Init track!
And now, quickly, go to the LPC website and post your session proposal in the Audio resp. Boot&Init; track! Thank you!
As some of you might know Fedora 15 went Gold a couple of days ago. The first big distribution based on systemd will be released 2011-05-24. Mark the date!
In little over a year systemd went from nowhere to became a core piece of Fedora. This wasn't possible without the numerous folks who worked with us on getting systemd right, supplied patches, chased bugs, tested releases and posted comments and generally made sure everything was in shape for the big release.
At this point we'd like to thank everybody who contributed and a few folks in particular:
A. Costa Adrian Spinu Alexey Shabalin Andreas Jaeger Andrew Edmunds Andrey Borzenkov Bill Nottingham Brandon Philips Brendan Jones Brett Witherspoon Chris E Ferron Christian Ruppert Conrad Meyer Daniel J Walsh Dave Reisner Eric Paris Fabian Henze Fabiano Fidêncio Florian Kriener Franz Dietrich Greg Kroah-Hartman Gustavo Sverzut Barbieri Harald Hoyer James Laska Jan Engelhardt Jeff Mahoney Jesse Zhang Jóhann B. Guðmundsson Karel Zak Koen Kooi Lucas De Marchi Ludwig Nussel Luis Felipe Strano Moraes Maarten Lankhorst Malcolm Studd Marc-Antoine Perennou Martin Mikkelsen Matthew Miller Matthias Clasen Matthias Schiffer Michael Biebl Michael Olbrich Michael Tremer Michał Piotrowski Michal Schmidt Mike Kazantsev Mike Kelly Miklos Vajna Milan Broz Ozan Çağlayan Paul Menzel Pavol Rusnak Rahul Sundaram Rainer Gerhards Ran Benita Ray Strode Robert Gerus Sedat Dilek Tero Roponen Thierry Reding Tollef Fog Heen Tomasz Torcz Tom Callaway Tom Gundersen Toshio Kuratomi William Jon McCann Wulf C. Krueger Zbigniew Jędrzejewski-Szmek
And everybody else who I (or git shortlog) forgot.
Thank you!
Lennart and Kay
BTW, the interface stability promise is valid now.
systemd not only brings improvements for administrators and users, it also brings a (small) number of new APIs with it. In this blog story (which might become the first of a series) I hope to shed some light on one of the most important new APIs in systemd:
In the original blog story about systemd I tried to explain why socket activation is a wonderful technology to spawn services. Let's reiterate the background here a bit.
The basic idea of socket activation is not new. The inetd superserver was a standard component of most Linux and Unix systems since time began: instead of spawning all local Internet services already at boot, the superserver would listen on behalf of the services and whenever a connection would come in an instance of the respective service would be spawned. This allowed relatively weak machines with few resources to offer a big variety of services at the same time. However it quickly got a reputation for being somewhat slow: since daemons would be spawned for each incoming connection a lot of time was spent on forking and initialization of the services -- once for each connection, instead of once for them all.
Spawning one instance per connection was how inetd was primarily used, even though inetd actually understood another mode: on the first incoming connection it would notice this via poll() (or select()) and spawn a single instance for all future connections. (This was controllable with the wait/nowait options.) That way the first connection would be slow to set up, but subsequent ones would be as fast as with a standalone service. In this mode inetd would work in a true on-demand mode: a service would be made available lazily when it was required.
inetd's focus was clearly on AF_INET (i.e. Internet) sockets. As time progressed and Linux/Unix left the server niche and became increasingly relevant on desktops, mobile and embedded environments inetd was somehow lost in the troubles of time. Its reputation for being slow, and the fact that Linux' focus shifted away from only Internet servers made a Linux machine running inetd (or one of its newer implementations, like xinetd) the exception, not the rule.
When Apple engineers worked on optimizing the MacOS boot time they found a new way to make use of the idea of socket activation: they shifted the focus away from AF_INET sockets towards AF_UNIX sockets. And they noticed that on-demand socket activation was only part of the story: much more powerful is socket activation when used for all local services including those which need to be started anyway on boot. They implemented these ideas in launchd, a central building block of modern MacOS X systems, and probably the main reason why MacOS is so fast booting up.
But, before we continue, let's have a closer look what the benefits of socket activation for non-on-demand, non-Internet services in detail are. Consider the four services Syslog, D-Bus, Avahi and the Bluetooth daemon. D-Bus logs to Syslog, hence on traditional Linux systems it would get started after Syslog. Similarly, Avahi requires Syslog and D-Bus, hence would get started after both. Finally Bluetooth is similar to Avahi and also requires Syslog and D-Bus but does not interface at all with Avahi. Sinceoin a traditional SysV-based system only one service can be in the process of getting started at a time, the following serialization of startup would take place: Syslog → D-Bus → Avahi → Bluetooth (Of course, Avahi and Bluetooth could be started in the opposite order too, but we have to pick one here, so let's simply go alphabetically.). To illustrate this, here's a plot showing the order of startup beginning with system startup (at the top).
Certain distributions tried to improve this strictly serialized start-up: since Avahi and Bluetooth are independent from each other, they can be started simultaneously. The parallelization is increased, the overall startup time slightly smaller. (This is visualized in the middle part of the plot.)
Socket activation makes it possible to start all four services completely simultaneously, without any kind of ordering. Since the creation of the listening sockets is moved outside of the daemons themselves we can start them all at the same time, and they are able to connect to each other's sockets right-away. I.e. in a single step the /dev/log and /run/dbus/system_bus_socket sockets are created, and in the next step all four services are spawned simultaneously. When D-Bus then wants to log to syslog, it just writes its messages to /dev/log. As long as the socket buffer does not run full it can go on immediately with what else it wants to do for initialization. As soon as the syslog service catches up it will process the queued messages. And if the socket buffer runs full then the client logging will temporarily block until the socket is writable again, and continue the moment it can write its log messages. That means the scheduling of our services is entirely done by the kernel: from the userspace perspective all services are run at the same time, and when one service cannot keep up the others needing it will temporarily block on their request but go on as soon as these requests are dispatched. All of this is completely automatic and invisible to userspace. Socket activation hence allows us to drastically parallelize start-up, enabling simultaneous start-up of services which previously were thought to strictly require serialization. Most Linux services use sockets as communication channel. Socket activation allows starting of clients and servers of these channels at the same time.
But it's not just about parallelization. It offers a number of other benefits:
For another explanation of this idea consult the original blog story about systemd.
Socket activation has been available in systemd since its inception. On Fedora 15 a number of services have been modified to implement socket activation, including Avahi, D-Bus and rsyslog (to continue with the example above).
systemd's socket activation is quite comprehensive. Not only classic sockets are support but related technologies as well:
A service capable of socket activation must be able to receive its preinitialized sockets from systemd, instead of creating them internally. For most services this requires (minimal) patching. However, since systemd actually provides inetd compatibility a service working with inetd will also work with systemd -- which is quite useful for services like sshd for example.
So much about the background of socket activation, let's now have a look how to patch a service to make it socket activatable. Let's start with a theoretic service foobard. (In a later blog post we'll focus on real-life example.)
Our little (theoretic) service includes code like the following for creating sockets (most services include code like this in one way or another):
/* Source Code Example #1: ORIGINAL, NOT SOCKET-ACTIVATABLE SERVICE */
...
union {
struct sockaddr sa;
struct sockaddr_un un;
} sa;
int fd;
fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd < 0) {
fprintf(stderr, "socket(): %m\n");
exit(1);
}
memset(&sa, 0, sizeof(sa));
sa.un.sun_family = AF_UNIX;
strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));
if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
fprintf(stderr, "bind(): %m\n");
exit(1);
}
if (listen(fd, SOMAXCONN) < 0) {
fprintf(stderr, "listen(): %m\n");
exit(1);
}
...
A socket activatable service may use the following code instead:
/* Source Code Example #2: UPDATED, SOCKET-ACTIVATABLE SERVICE */
...
#include "sd-daemon.h"
...
int fd;
if (sd_listen_fds(0) != 1) {
fprintf(stderr, "No or too many file descriptors received.\n");
exit(1);
}
fd = SD_LISTEN_FDS_START + 0;
...
systemd might pass you more than one socket (based on configuration, see below). In this example we are interested in one only. sd_listen_fds() returns how many file descriptors are passed. We simply compare that with 1, and fail if we got more or less. The file descriptors systemd passes to us are inherited one after the other beginning with fd #3. (SD_LISTEN_FDS_START is a macro defined to 3). Our code hence just takes possession of fd #3.
As you can see this code is actually much shorter than the original. This of course comes at the price that our little service with this change will no longer work in a non-socket-activation environment. With minimal changes we can adapt our example to work nicely both with and without socket activation:
/* Source Code Example #3: UPDATED, SOCKET-ACTIVATABLE SERVICE WITH COMPATIBILITY */
...
#include "sd-daemon.h"
...
int fd, n;
n = sd_listen_fds(0);
if (n > 1) {
fprintf(stderr, "Too many file descriptors received.\n");
exit(1);
} else if (n == 1)
fd = SD_LISTEN_FDS_START + 0;
else {
union {
struct sockaddr sa;
struct sockaddr_un un;
} sa;
fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd < 0) {
fprintf(stderr, "socket(): %m\n");
exit(1);
}
memset(&sa, 0, sizeof(sa));
sa.un.sun_family = AF_UNIX;
strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));
if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
fprintf(stderr, "bind(): %m\n");
exit(1);
}
if (listen(fd, SOMAXCONN) < 0) {
fprintf(stderr, "listen(): %m\n");
exit(1);
}
}
...
With this simple change our service can now make use of socket activation but still works unmodified in classic environments. Now, let's see how we can enable this service in systemd. For this we have to write two systemd unit files: one describing the socket, the other describing the service. First, here's foobar.socket:
[Socket] ListenStream=/run/foobar.sk [Install] WantedBy=sockets.target
And here's the matching service file foobar.service:
[Service] ExecStart=/usr/bin/foobard
If we place these two files in /etc/systemd/system we can enable and start them:
# systemctl enable foobar.socket # systemctl start foobar.socket
Now our little socket is listening, but our service not running yet. If we now connect to /run/foobar.sk the service will be automatically spawned, for on-demand service start-up. With a modification of foobar.service we can start our service already at startup, thus using socket activation only for parallelization purposes, not for on-demand auto-spawning anymore:
[Service] ExecStart=/usr/bin/foobard [Install] WantedBy=multi-user.target
And now let's enable this too:
# systemctl enable foobar.service # systemctl start foobar.service
Now our little daemon will be started at boot and on-demand, whatever comes first. It can be started fully in parallel with its clients, and when it dies it will be automatically restarted when it is used the next time.
A single .socket file can include multiple ListenXXX stanzas, which is useful for services that listen on more than one socket. In this case all configured sockets will be passed to the service in the exact order they are configured in the socket unit file. Also, you may configure various socket settings in the .socket files.
In real life it's a good idea to include description strings in these unit files, to keep things simple we'll leave this out of our example. Speaking of real-life: our next installment will cover an actual real-life example. We'll add socket activation to the CUPS printing server.
The sd_listen_fds() function call is defined in sd-daemon.h and sd-daemon.c. These two files are currently drop-in .c sources which projects should simply copy into their source tree. Eventually we plan to turn this into a proper shared library, however using the drop-in files allows you to compile your project in a way that is compatible with socket activation even without any compile time dependencies on systemd. sd-daemon.c is liberally licensed, should compile fine on the most exotic Unixes and the algorithms are trivial enough to be reimplemented with very little code if the license should nonetheless be a problem for your project. sd-daemon.c contains a couple of other API functions besides sd_listen_fds() that are useful when implementing socket activation in a project. For example, there's sd_is_socket() which can be used to distuingish and identify particular sockets when a service gets passed more than one.
Let me point out that the interfaces used here are in no way bound directly to systemd. They are generic enough to be implemented in other systems as well. We deliberately designed them as simple and minimal as possible to make it possible for others to adopt similar schemes.
Stay tuned for the next installment. As mentioned, it will cover a real-life example of turning an existing daemon into a socket-activatable one: the CUPS printing service. However, I hope this blog story might already be enough to get you started if you plan to convert an existing service into a socket activatable one. We invite everybody to convert upstream projects to this scheme. If you have any questions join us on #systemd on freenode.
Pablo Hess has been posting a series of articles on systemd on IBM DeveloperWorks Brasil. So, if you speak portuguese head over there and have a look!
D. Jansen has put up a blog story including some power saving results when running PulseAudio on modern HDA drivers. This shows off some work Pierre-Louis Bossart from Intel did on the HDA drivers which now enables the timer-based scheduling code in PulseAudio I added quite some time ago to come to its full potential. You can save half a Watt and reduce wakeups while playing audio to 1 wakeup/s.
Previously there was little public profiling data available about the benefits PA brings you for low-power devices. Thanks to Dennis' data there's now public data available that hopefully explains why PA is the best choice for low-power devices as well as desktops. Hopefully this cleans up some misconceptions.
Pierre-Louis, thanks for your work!
Sankarasivasubramanian Pasupathilingam has put together a PDF of my ongoing systemd for Administrators series. This might be handy for reading on an ebook reader or similar.
Enjoy!
systemd is still a young project, but it is not a baby anymore. The initial announcement I posted precisely a year ago. Since then most of the big distributions have decided to adopt it in one way or another, many smaller distributions have already switched. The first big distribution with systemd by default will be Fedora 15, due end of May. It is expected that the others will follow the lead a bit later (with one exception). Many embedded developers have already adopted it too, and there's even a company specializing on engineering and consulting services for systemd. In short: within one year systemd became a really successful project.
However, there are still folks who we haven't won over yet. If you fall into one of the following categories, then please have a look on the comparison of init systems below:
And even if you don't fall into any of these categories, you might still find the comparison interesting.
We'll be comparing the three most relevant init systems for Linux: sysvinit, Upstart and systemd. Of course there are other init systems in existance, but they play virtually no role in the big picture. Unless you run Android (which is a completely different beast anyway), you'll almost definitely run one of these three init systems on your Linux kernel. (OK, or busybox, but then you are basically not running any init system at all.) Unless you have a soft spot for exotic init systems there's little need to look further. Also, I am kinda lazy, and don't want to spend the time on analyzing those other systems in enough detail to be completely fair to them.
Speaking of fairness: I am of course one of the creators of systemd. I will try my best to be fair to the other two contenders, but in the end, take it with a grain of salt. I am sure though that should I be grossly unfair or otherwise incorrect somebody will point it out in the comments of this story, so consider having a look on those, before you put too much trust in what I say.
We'll look at the currently implemented features in a released version. Grand plans don't count.
| sysvinit | Upstart | systemd | |
|---|---|---|---|
| Interfacing via D-Bus | no | yes | yes |
| Shell-free bootup | no | no | yes |
| Modular C coded early boot services included | no | no | yes |
| Read-Ahead | no | no[1] | yes |
| Socket-based Activation | no | no[2] | yes |
| Socket-based Activation: inetd compatibility | no | no[2] | yes |
| Bus-based Activation | no | no[3] | yes |
| Device-based Activation | no | no[4] | yes |
| Configuration of device dependencies with udev rules | no | no | yes |
| Path-based Activation (inotify) | no | no | yes |
| Timer-based Activation | no | no | yes |
| Mount handling | no | no[5] | yes |
| fsck handling | no | no[5] | yes |
| Quota handling | no | no | yes |
| Automount handling | no | no | yes |
| Swap handling | no | no | yes |
| Snapshotting of system state | no | no | yes |
| XDG_RUNTIME_DIR Support | no | no | yes |
| Optionally kills remaining processes of users logging out | no | no | yes |
| Linux Control Groups Integration | no | no | yes |
| Audit record generation for started services | no | no | yes |
| SELinux integration | no | no | yes |
| PAM integration | no | no | yes |
| Encrypted hard disk handling (LUKS) | no | no | yes |
| SSL Certificate/LUKS Password handling, including Plymouth, Console, wall(1), TTY and GNOME agents | no | no | yes |
| Network Loopback device handling | no | no | yes |
| binfmt_misc handling | no | no | yes |
| System-wide locale handling | no | no | yes |
| Console and keyboard setup | no | no | yes |
| Infrastructure for creating, removing, cleaning up of temporary and volatile files | no | no | yes |
| Handling for /proc/sys sysctl | no | no | yes |
| Plymouth integration | no | yes | yes |
| Save/restore random seed | no | no | yes |
| Static loading of kernel modules | no | no | yes |
| Automatic serial console handling | no | no | yes |
| Unique Machine ID handling | no | no | yes |
| Dynamic host name and machine meta data handling | no | no | yes |
| Reliable termination of services | no | no | yes |
| Early boot /dev/log logging | no | no | yes |
| Minimal kmsg-based syslog daemon for embedded use | no | no | yes |
| Respawning on service crash without losing connectivity | no | no | yes |
| Gapless service upgrades | no | no | yes |
| Graphical UI | no | no | yes |
| Built-In Profiling and Tools | no | no | yes |
| Instantiated services | no | yes | yes |
| PolicyKit integration | no | no | yes |
| Remote access/Cluster support built into client tools | no | no | yes |
| Can list all processes of a service | no | no | yes |
| Can identify service of a process | no | no | yes |
| Automatic per-service CPU cgroups to even out CPU usage between them | no | no | yes |
| Automatic per-user cgroups | no | no | yes |
| SysV compatibility | yes | yes | yes |
| SysV services controllable like native services | yes | no | yes |
| SysV-compatible /dev/initctl | yes | no | yes |
| Reexecution with full serialization of state | yes | no | yes |
| Interactive boot-up | no[6] | no[6] | yes |
| Container support (as advanced chroot() replacement) | no | no | yes |
| Dependency-based bootup | no[7] | no | yes |
| Disabling of services without editing files | yes | no | yes |
| Masking of services without editing files | no | no | yes |
| Robust system shutdown within PID 1 | no | no | yes |
| Built-in kexec support | no | no | yes |
| Dynamic service generation | no | no | yes |
| Upstream support in various other OS components | yes | no | yes |
| Service files compatible between distributions | no | no | yes |
| Signal delivery to services | no | no | yes |
| Reliable termination of user sessions before shutdown | no | no | yes |
| utmp/wtmp support | yes | yes | yes |
| Easily writable, extensible and parseable service files, suitable for manipulation with enterprise management tools | no | no | yes |
[1] Read-Ahead implementation for Upstart available in separate package ureadahead, requires non-standard kernel patch.
[2] Socket activation implementation for Upstart available as preview, lacks parallelization support hence entirely misses the point of socket activation.
[3] Bus activation implementation for Upstart posted as patch, not merged.
[4] udev device event bridge implementation for Upstart available as preview, forwards entire udev database into Upstart, not practical.
[5] Mount handling utility mountall for Upstart available in separate package, covers only boot-time mounts, very limited dependency system.
[6] Some distributions offer this implemented in shell.
[7] LSB init scripts support this, if they are used.
| sysvinit | Upstart | systemd | |
|---|---|---|---|
| OOM Adjustment | no | yes[1] | yes |
| Working Directory | no | yes | yes |
| Root Directory (chroot()) | no | yes | yes |
| Environment Variables | no | yes | yes |
| Environment Variables from external file | no | no | yes |
| Resource Limits | no | some[2] | yes |
| umask | no | yes | yes |
| User/Group/Supplementary Groups | no | no | yes |
| IO Scheduling Class/Priority | no | no | yes |
| CPU Scheduling Nice Value | no | yes | yes |
| CPU Scheduling Policy/Priority | no | no | yes |
| CPU Scheduling Reset on fork() control | no | no | yes |
| CPU affinity | no | no | yes |
| Timer Slack | no | no | yes |
| Capabilities Control | no | no | yes |
| Secure Bits Control | no | no | yes |
| Control Group Control | no | no | yes |
| High-level file system namespace control: making directories inacessible | no | no | yes |
| High-level file system namespace control: making directories read-only | no | no | yes |
| High-level file system namespace control: private /tmp | no | no | yes |
| High-level file system namespace control: mount inheritance | no | no | yes |
| Input on Console | yes | yes | yes |
| Output on Syslog | no | no | yes |
| Output on kmsg/dmesg | no | no | yes |
| Output on arbitrary TTY | no | no | yes |
| Kill signal control | no | no | yes |
| Conditional execution: by identified CPU virtualization/container | no | no | yes |
| Conditional execution: by file existance | no | no | yes |
| Conditional execution: by security framework | no | no | yes |
| Conditional execution: by kernel command line | no | no | yes |
[1] Upstart supports only the deprecated oom_score_adj mechanism, not the current oom_adj logic.
[2] Upstart lacks support for RLIMIT_RTTIME and RLIMIT_RTPRIO.
Note that some of these options are relatively easily added to SysV init scripts, by editing the shell sources. The table above focusses on easily accessible options that do not require source code editing.
| sysvinit | Upstart | systemd | |
|---|---|---|---|
| Maturity | > 15 years | 6 years | 1 year |
| Specialized professional consulting and engineering services available | no | no | yes |
| SCM | Subversion | Bazaar | git |
| Copyright-assignment-free contributing | yes | no | yes |
As the tables above hopefully show in all clarity systemd has left behind both sysvinit and Upstart in almost every aspect. With the exception of the project's age/maturity systemd wins in every category. At this point in time it will be very hard for sysvinit and Upstart to catch up with the features systemd provides today. In one year we managed to push systemd forward much further than Upstart has been pushed in six.
It is our intention to drive forward the development of the Linux platform with systemd. In the next release cycle we will focus more strongly on providing the same features and speed improvement we already offer for the system to the user login session. This will bring much closer integration with the other parts of the OS and applications, making the most of the features the service manager provides, and making it available to login sessions. Certain components such as ConsoleKit will be made redundant by these upgrades, and services relying on them will be updated. The burden for maintaining these then obsolete components will be passed on the vendors who plan to continue to rely on them.
If you are wondering whether or not to adopt systemd, then systemd obviously wins when it comes to mere features. Of course that should not be the only aspect to keep in mind. In the long run, sticking with the existing infrastructure (such as ConsoleKit) comes at a price: porting work needs to take place, and additional maintainance work for bitrotting code needs to be done. Going it on your own means increased workload.
That said, adopting systemd is also not free. Especially if you made investments in the other two solutions adopting systemd means work. The basic work to adopt systemd is relatively minimal for porting over SysV systems (since compatibility is provided), but can mean substantial work when coming from Upstart. If you plan to go for a 100% systemd system without any SysV compatibility (recommended for embedded, long run goal for the big distributions) you need to be willing to invest some work to rewrite init scripts as simple systemd unit files.
systemd is in the process of becoming a comprehensive, integrated and modular platform providing everything needed to bootstrap and maintain an operating system's userspace. It includes C rewrites of all basic early boot init scripts that are shipped with the various distributions. Especially for the embedded case adopting systemd provides you in one step with almost everything you need, and you can pick the modules you want. The other two init systems are singular individual components, which to be useful need a great number of additional components with differing interfaces. The emphasis of systemd to provide a platform instead of just a component allows for closer integration, and cleaner APIs. Sooner or later this will trickle up to the applications. Already, there are accepted XDG specifications (e.g. XDG basedir spec, more specifically XDG_RUNTIME_DIR) that are not supported on the other init systems.
systemd is also a big opportunity for Linux standardization. Since it standardizes many interfaces of the system that previously have been differing on every distribution, on every implementation, adopting it helps to work against the balkanization of the Linux interfaces. Choosing systemd means redefining more closely what the Linux platform is about. This improves the lifes of programmers, users and administrators alike.
I believe that momentum is clearly with systemd. We invite you to join our community and be part of that momentum.
Another episode of my ongoing series on systemd for Administrators:
One of the formidable new features of systemd is that it comes with a complete set of modular early-boot services that are written in simple, fast, parallelizable and robust C, replacing the shell "novels" the various distributions featured before. Our little Project Zero Shell[1] has been a full success. We currently cover pretty much everything most desktop and embedded distributions should need, plus a big part of the server needs:
On a standard Fedora 15 install, only a few legacy and storage services still require shell scripts during early boot. If you don't need those, you can easily disable them end enjoy your shell-free boot (like I do every day). The shell-less boot systemd offers you is a unique feature on Linux.
Many of these small components are configured via configuration files in /etc. Some of these are fairly standardized among distributions and hence supporting them in the C implementations was easy and obvious. Examples include: /etc/fstab, /etc/crypttab or /etc/sysctl.conf. However, for others no standardized file or directory existed which forced us to add #ifdef orgies to our sources to deal with the different places the distributions we want to support store these things. All these configuration files have in common that they are dead-simple and there is simply no good reason for distributions to distuingish themselves with them: they all do the very same thing, just a bit differently.
To improve the situation and benefit from the unifying force that systemd is we thus decided to read the per-distribution configuration files only as fallbacks -- and to introduce new configuration files as primary source of configuration wherever applicable. Of course, where possible these standardized configuration files should not be new inventions but rather just standardizations of the best distribution-specific configuration files previously used. Here's a little overview over these new common configuration files systemd supports on all distributions:
It is our definite intention to convince you to use these new configuration files in your configuration tools: if your configuration frontend writes these files instead of the old ones, it automatically becomes more portable between Linux distributions, and you are helping standardizing Linux. This makes things simpler to understand and more obvious for users and administrators. Of course, right now, only systemd-based distributions read these files, but that already covers all important distributions in one way or another, except for one. And it's a bit of a chicken-and-egg problem: a standard becomes a standard by being used. In order to gently push everybody to standardize on these files we also want to make clear that sooner or later we plan to drop the fallback support for the old configuration files from systemd. That means adoption of this new scheme can happen slowly and piece by piece. But the final goal of only having one set of configuration files must be clear.
Many of these configuration files are relevant not only for configuration tools but also (and sometimes even primarily) in upstream projects. For example, we invite projects like Mono, Java, or WINE to install a drop-in file in /etc/binfmt.d/ from their upstream build systems. Per-distribution downstream support for binary formats would then no longer be necessary and your platform would work the same on all distributions. Something similar applies to all software which need creation/cleaning of certain runtime files and directories at boot, for example beneath the /run hierarchy (i.e. /var/run as it used to be known). These projects should just drop in configuration files in /etc/tmpfiles.d, also from the upstream build systems. This also helps speeding up the boot process, as separate per-project SysV shell scripts which implement trivial things like registering a binary format or removing/creating temporary/volatile files at boot are no longer necessary. Or another example, where upstream support would be fantastic: projects like X11 could probably benefit from reading the default keyboard mapping for its displays from /etc/vconsole.conf.
Of course, I have no doubt that not everybody is happy with our choice of names (and formats) for these configuration files. In the end we had to pick something, and from all the choices these appeared to be the most convincing. The file formats are as simple as they can be, and usually easily written and read even from shell scripts. That said, /etc/bikeshed.conf could of course also have been a fantastic configuration file name!
So, help us standardizing Linux! Use the new configuration files! Adopt them upstream, adopt them downstream, adopt them all across the distributions!
Oh, and in case you are wondering: yes, all of these files were discussed in one way or another with various folks from the various distributions. And there has even been some push towards supporting some of these files even outside of systemd systems.
Footnotes
[1] Our slogan: "The only shell that should get started during boot is gnome-shell!" -- Yes, the slogan needs a bit of work, but you get the idea.
Here's yet another installment of my ongoing series on systemd for Administrators:
Fedora 15[1] is the first Fedora release to sport systemd. Our primary goal for F15 was to get everything integrated and working well. One focus for Fedora 16 will be to further polish and speed up what we have in the distribution now. To prepare for this cycle we have implemented a few tools (which are already available in F15), which can help us pinpoint where exactly the biggest problems in our boot-up remain. With this blog story I hope to shed some light on how to figure out what to blame for your slow boot-up, and what to do about it. We want to allow you to put the blame where the blame belongs: on the system component responsible.
The first utility is a very simple one: systemd will automatically write a log message with the time it needed to syslog/kmsg when it finished booting up.
systemd[1]: Startup finished in 2s 65ms 924us (kernel) + 2s 828ms 195us (initrd) + 11s 900ms 471us (userspace) = 16s 794ms 590us.
And here's how you read this: 2s have been spent for kernel initialization, until the time where the initial RAM disk (initrd, i.e. dracut) was started. A bit less than 3s have then been spent in the initrd. Finally, a bit less than 12s have been spent after the actual system init daemon (systemd) has been invoked by the initrd to bring up userspace. Summing this up the time that passed since the boot loader jumped into the kernel code until systemd was finished doing everything it needed to do at boot was a bit less than 17s. This number is nice and simple to understand -- and also easy to misunderstand: it does not include the time that is spent initializing your GNOME session, as that is outside of the scope of the init system. Also, in many cases this is just where systemd finished doing everything it needed to do. Very likely some daemons are still busy doing whatever they need to do to finish startup when this time is elapsed. Hence: while the time logged here is a good indication on the general boot speed, it is not the time the user might feel the boot actually takes.
Also, it is a pretty superficial value: it gives no insight which system component systemd was waiting for all the time. To break this up, we introduced the tool systemd-analyze blame:
$ systemd-analyze blame 6207ms udev-settle.service 5228ms cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service 735ms NetworkManager.service 642ms avahi-daemon.service 600ms abrtd.service 517ms rtkit-daemon.service 478ms fedora-storage-init.service 396ms dbus.service 390ms rpcidmapd.service 346ms systemd-tmpfiles-setup.service 322ms fedora-sysinit-unhack.service 316ms cups.service 310ms console-kit-log-system-start.service 309ms libvirtd.service 303ms rpcbind.service 298ms ksmtuned.service 288ms lvm2-monitor.service 281ms rpcgssd.service 277ms sshd.service 276ms livesys.service 267ms iscsid.service 236ms mdmonitor.service 234ms nfslock.service 223ms ksm.service 218ms mcelog.service ...
This tool lists which systemd unit needed how much time to finish initialization at boot, the worst offenders listed first. What we can see here is that on this boot two services required more than 1s of boot time: udev-settle.service and cryptsetup@luks\x2d9899b85d\x2df790\x2d4d2a\x2da650\x2d8b7d2fb92cc3.service. This tool's output is easily misunderstood as well, it does not shed any light on why the services in question actually need this much time, it just determines that they did. Also note that the times listed here might be spent "in parallel", i.e. two services might be initializing at the same time and thus the time spent to initialize them both is much less than the sum of both individual times combined.
Let's have a closer look at the worst offender on this boot: a service by the name of udev-settle.service. So why does it take that much time to initialize, and what can we do about it? This service actually does very little: it just waits for the device probing being done by udev to finish and then exits. Device probing can be slow. In this instance for example, the reason for the device probing to take more than 6s is the 3G modem built into the machine, which when not having an inserted SIM card takes this long to respond to software probe requests. The software probing is part of the logic that makes ModemManager work and enables NetworkManager to offer easy 3G setup. An obvious reflex might now be to blame ModemManager for having such a slow prober. But that's actually ill-directed: hardware probing quite frequently is this slow, and in the case of ModemManager it's a simple fact that the 3G hardware takes this long. It is an essential requirement for a proper hardware probing solution that individual probers can take this much time to finish probing. The actual culprit is something else: the fact that we actually wait for the probing, in other words: that udev-settle.service is part of our boot process.
So, why is udev-settle.service part of our boot process? Well, it actually doesn't need to be. It is pulled in by the storage setup logic of Fedora: to be precise, by the LVM, RAID and Multipath setup script. These storage services have not been implemented in the way hardware detection and probing work today: they expect to be initialized at a point in time where "all devices have been probed", so that they can simply iterate through the list of available disks and do their work on it. However, on modern machinery this is not how things actually work: hardware can come and hardware can go all the time, during boot and during runtime. For some technologies it is not even possible to know when the device enumeration is complete (example: USB, or iSCSI), thus waiting for all storage devices to show up and be probed must necessarily include a fixed delay when it is assumed that all devices that can show up have shown up, and got probed. In this case all this shows very negatively in the boot time: the storage scripts force us to delay bootup until all potential devices have shown up and all devices that did got probed -- and all that even though we don't actually need most devices for anything. In particular since this machine actually does not make use of LVM, RAID or Multipath![2]
Knowing what we know now we can go and disable udev-settle.service for the next boots: since neither LVM, RAID nor Multipath is used we can mask the services in question and thus speed up our boot a little:
# ln -s /dev/null /etc/systemd/system/udev-settle.service # ln -s /dev/null /etc/systemd/system/fedora-wait-storage.service # ln -s /dev/null /etc/systemd/system/fedora-storage-init.service # systemctl daemon-reload
After restarting we can measure that the boot is now about 1s faster. Why just 1s? Well, the second worst offender is cryptsetup here: the machine in question has an encrypted /home directory. For testing purposes I have stored the passphrase in a file on disk, so that the boot-up is not delayed because I as the user am a slow typer. The cryptsetup tool unfortunately still takes more han 5s to set up the encrypted partition. Being lazy instead of trying to fix cryptsetup[3] we'll just tape over it here [4]: systemd will normally wait for all file systems not marked with the noauto option in /etc/fstab to show up, to be fscked and to be mounted before proceeding bootup and starting the usual system services. In the case of /home (unlike for example /var) we know that it is needed only very late (i.e. when the user actually logs in). An easy fix is hence to make the mount point available already during boot, but not actually wait until cryptsetup, fsck and mount finished running for it. You ask how we can make a mount point available before actually mounting the file system behind it? Well, systemd possesses magic powers, in form of the comment=systemd.automount mount option in /etc/fstab. If you specify it, systemd will create an automount point at /home and when at the time of the first access to the file system it still isn't backed by a proper file system systemd will wait for the device, fsck and mount it.
And here's the result with this change to /etc/fstab made:
systemd[1]: Startup finished in 2s 47ms 112us (kernel) + 2s 663ms 942us (initrd) + 5s 540ms 522us (userspace) = 10s 251ms 576us.
Nice! With a few fixes we took almost 7s off our boot-time. And these two changes are only fixes for the two most superficial problems. With a bit of love and detail work there's a lot of additional room for improvements. In fact, on a different machine, a more than two year old X300 laptop (which even back then wasn't the fastest machine on earth) and a bit of decrufting we have boot times of around 4s (total) now, with a resonably complete GNOME system. And there's still a lot of room in it.
systemd-analyze blame is a nice and simple tool for tracking down slow services. However, it suffers by a big problem: it does not visualize how the parallel execution of the services actually diminishes the price one pays for slow starting services. For that we have prepared systemd-analyize plot for you. Use it like this:
$ systemd-analyze plot > plot.svg $ eog plot.svg
It creates pretty graphs, showing the time services spent to start up in relation to the other services. It currently doesn't visualize explicitly which services wait for which ones, but with a bit of guess work this is easily seen nonetheless.
To see the effect of our two little optimizations here are two graphs generated with systemd-analyze plot, the first before and the other after our change:
(For the sake of completeness, here are the two complete outputs of systemd-analyze blame for these two boots: before and after.)
The well-informed reader probably wonders how this relates to Michael Meeks' bootchart. This plot and bootchart do show similar graphs, that is true. Bootchart is by far the more powerful tool. It plots in all detail what is happening during the boot, how much CPU and IO is used. systemd-analyze plot shows more high-level data: which service took how much time to initialize, and what needed to wait for it. If you use them both together you'll have a wonderful toolset to figure out why your boot is not as fast as it could be.
Now, before you now take these tools and start filing bugs against the worst boot-up time offenders on your system: think twice. These tools give you raw data, don't misread it. As my optimization example above hopefully shows, the blame for the slow bootup was not actually with udev-settle.service, and not with the ModemManager prober run by it either. It is with the subsystem that pulled this service in in the first place. And that's where the problem needs to be fixed. So, file the bugs at the right places. Put the blame where the blame belongs.
As mentioned, these three utilities are available on your Fedora 15 system out-of-the-box.
And here's what to take home from this little blog story:
And that's all for now. Thank you for your interest.
Footnotes
[1] Also known as the greatest Free Software OS release ever.
[2] The right fix here is to improve the services in question to actively listen to hotplug events via libudev or similar and act on the devices showing up as they show up, so that we can continue with the bootup the instant everything we really need to go on has shown up. To get a quick bootup we should wait for what we actually need to proceed, not for everything. Also note that the storage services are not the only services which do not cope well with modern dynamic hardware, and assume that the device list is static and stays unchanged. For example, in this example the reason the initrd is actually as slow as it is is mostly due to the fact that Plymouth expects to be executed when all video devices have shown up and have been probed. For an unknown reason (at least unknown to me) loading the video kernel modules for my Intel graphics cards takes multiple seconds, and hence the entire boot is delayed unnecessarily. (Here too I'd not put the blame on the probing but on the fact that we wait for it to complete before going on.)
[3] Well, to be precise, I actually did try to get this fixed. Most of the delay of crypsetup stems from the -- in my eyes -- unnecessarily high default values for --iter-time in cryptsetup. I tried to convince our cryptsetup maintainers that 100ms as a default here are not really less secure than 1s, but well, I failed.
[4] Of course, it's usually not our style to just tape over problems instead of fixing them, but this is such a nice occasion to show off yet another cool systemd feature...