Category: projects

systemd for Administrators, Part V

It has been a while since the last installment of my systemd for Administrators series, but now with the release of Fedora 15 based on systemd looming, here's a new episode:

The Three Levels of "Off"

In systemd, there are three levels of turning off a service (or other unit). Let's have a look which those are:

You can stop a service. That simply terminates the running instance of the service and does little else. If due to some form of activation (such as manual activation, socket activation, bus activation, activation by system boot or activation by hardware plug) the service is requested again afterwards it will be started. Stopping a service is hence a very simple, temporary and superficial operation. Here's an example how to do this for the NTP service:
```
$ systemctl stop ntpd.service
```
This is roughly equivalent to the following traditional command which is available on most SysV inspired systems:
```
$ service ntpd stop
```
In fact, on Fedora 15, if you execute the latter command it will be transparently converted to the former.
You can disable a service. This unhooks a service from its activation triggers. That means, that depending on your service it will no longer be activated on boot, by socket or bus activation or by hardware plug (or any other trigger that applies to it). However, you can still start it manually if you wish. If there is already a started instance disabling a service will not have the effect of stopping it. Here's an example how to disable a service:
```
$ systemctl disable ntpd.service
```
On traditional Fedora systems, this is roughly equivalent to the following command:
```
$ chkconfig ntpd off
```
And here too, on Fedora 15, the latter command will be transparently converted to the former, if necessary.

Often you want to combine stopping and disabling a service, to get rid of the current instance and make sure it is not started again (except when manually triggered):
```
$ systemctl disable ntpd.service
$ systemctl stop ntpd.service
```
Commands like this are for example used during package deinstallation of systemd services on Fedora.

Disabling a service is a permanent change; until you undo it it will be kept, even across reboots.
You can mask a service. This is like disabling a service, but on steroids. It not only makes sure that service is not started automatically anymore, but even ensures that a service cannot even be started manually anymore. This is a bit of a hidden feature in systemd, since it is not commonly useful and might be confusing the user. But here's how you do it:
```
$ ln -s /dev/null /etc/systemd/system/ntpd.service
$ systemctl daemon-reload
```
By symlinking a service file to /dev/null you tell systemd to never start the service in question and completely block its execution. Unit files stored in /etc/systemd/system override those from /lib/systemd/system that carry the same name. The former directory is administrator territory, the latter terroritory of your package manager. By installing your symlink in /etc/systemd/system/ntpd.service you hence make sure that systemd will never read the upstream shipped service file /lib/systemd/system/ntpd.service.

systemd will recognize units symlinked to /dev/null and show them as masked. If you try to start such a service manually (via systemctl start for example) this will fail with an error.

A similar trick on SysV systems does not (officially) exist. However, there are a few unofficial hacks, such as editing the init script and placing an exit 0 at the top, or removing its execution bit. However, these solutions have various drawbacks, for example they interfere with the package manager.

Masking a service is a permanent change, much like disabling a service.

Now that we learned how to turn off services on three levels, there's only one question left: how do we turn them on again? Well, it's quite symmetric. use systemctl start to undo systemctl stop. Use systemctl enable to undo systemctl disable and use rm to undo ln.

And that's all for now. Thank you for your attention!

Desktop Summit 2011 Call For Participation

In case you haven't noticed yet: the Call For Participation for the Desktop Summit 2011 (aka GUADEC 2011, aka Akademy 2011) in Berlin, Germany is open since yesterday. Submissions will be accepted until March 25th, so make sure to submit your proposals quickly.

FOSDEM Talk on Video

If you have already watched my presentation on systemd I gave at linux.conf.au 2011 then this video of my talk on the same topic which I have gave at FOSDEM 2011 in Brussels, Belgium will probably not be all new to you, but the questions from the audience (and hopefully my responses) might answer a question or two you might still have. So do watch it:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on YouTube.

Oh, and FOSDEM rocked, like every year!

LCA Talk on Video

I won't spare you the video of my talk about systemd at linux.conf.au 2011 in Brisbane, Australia last week:

Hmm, seems p.g.o strips the video from the blog post. So either read the original blog story or watch it directly on blip.tv.

LCA was fantastic and especially impressive given the circumstances of the recent floodings in Queensland. Really good conference, and congratulations to the organizers!

FOSDEM Interview with Yours Truly

The FOSDEM organizers just published a brief interview with yours truly regarding the presentation about systemd I will be giving there on Sat. Feb. 5th, 3pm. If you come to Brussels make sure to drop by! And even if you don't have a look on the interview!

If you don't make it to Brussels, there are two more stops in my little systemd World Tour in the next weeks: today (Wed. Jan. 26th, 2:30pm) I will be speaking at linux.conf.au in Brisbane, Australia. And on Fri. Feb. 11th, 1:20pm I'll be speaking at the Red Hat Developer Conference in Brno, Czech Republic.

systemd for Administrators, Part IV

Here's the fourth installment of my ongoing series about systemd for administrators.

Killing Services

Killing a system daemon is easy, right? Or is it?

Sure, as long as your daemon persists only of a single process this might actually be somewhat true. You type killall rsyslogd and the syslog daemon is gone. However it is a bit dirty to do it like that given that this will kill all processes which happen to be called like this, including those an unlucky user might have named that way by accident. A slightly more correct version would be to read the .pid file, i.e. kill `cat /var/run/syslogd.pid`. That already gets us much further, but still, is this really what we want?

More often than not it actually isn't. Consider a service like Apache, or crond, or atd, which as part of their usual operation spawn child processes. Arbitrary, user configurable child processes, such as cron or at jobs, or CGI scripts, even full application servers. If you kill the main apache/crond/atd process this might or might not pull down the child processes too, and it's up to those processes whether they want to stay around or go down as well. Basically that means that terminating Apache might very well cause its CGI scripts to stay around, reassigned to be children of init, and difficult to track down.

systemd to the rescue: With systemctl kill you can easily send a signal to all processes of a service. Example:

# systemctl kill crond.service

This will ensure that SIGTERM is delivered to all processes of the crond service, not just the main process. Of course, you can also send a different signal if you wish. For example, if you are bad-ass you might want to go for SIGKILL right-away:

# systemctl kill -s SIGKILL crond.service

And there you go, the service will be brutally slaughtered in its entirety, regardless how many times it forked, whether it tried to escape supervision by double forking or fork bombing.

Sometimes all you need is to send a specific signal to the main process of a service, maybe because you want to trigger a reload via SIGHUP. Instead of going via the PID file, here's an easier way to do this:

# systemctl kill -s HUP --kill-who=main crond.service

So again, what is so new and fancy about killing services in systemd? Well, for the first time on Linux we can actually properly do that. Previous solutions were always depending on the daemons to actually cooperate to bring down everything they spawned if they themselves terminate. However, usually if you want to use SIGTERM or SIGKILL you are doing that because they actually do not cooperate properly with you.

How does this relate to systemctl stop? kill goes directly and sends a signal to every process in the group, however stop goes through the official configured way to shut down a service, i.e. invokes the stop command configured with ExecStop= in the service file. Usually stop should be sufficient. kill is the tougher version, for cases where you either don't want the official shutdown command of a service to run, or when the service is hosed and hung in other ways.

(It's up to you BTW to specify signal names with or without the SIG prefix on the -s switch. Both works.)

It's a bit surprising that we have come so far on Linux without even being able to properly kill services. systemd for the first time enables you to do this properly.

systemd Status Update

It has been a while since my last status update on systemd. Here's another short, incomprehensive status update on what we worked on for systemd since then.

Fedora F15 (Rawhide) now includes a split up /etc/init.d/rc.sysinit (Bill Nottingham). This allows us to keep only a minimal compatibility set of shell scripts around, and boot otherwise a system without any shell scripts at all. In fact, shell scripts during early boot are only used in exceptional cases, i.e. when you enabled autoswapping (bad idea anyway), when a full SELinux relabel is necessary, during the first boot after initialization, if you have static kernel modules to load (which are not configured via the systemd-native way to do that), if you boot from a read-only NFS server, or when you rely on LVM/RAID/Multipath. If nothing of this applies to you can easily disable these parts of early boot and save several seconds on boot. How to do this I will describe in a later blog story.
We have a fully C coded shutdown logic that kills all remaining processes, unmounts all remaining file systems, detaches all loop devices and DM volumes and does that in the right way to ensure that all these things are properly teared down even if they depend on each other in arbitrary ways. This is not only considerably faster then the traditional shell hackery for this, but also a lot safer, since we try to unmount/remount the remaining file systems with a little bit of brains. This feature is available via systemctl --force poweroff to the administrator. The --force controls whether the usual shutdown of all services is run or whether this is skipped and we immediately shall enter this final C shutdown logic. Using --force hence is a much safer replacement for the old /sbin/reboot -f and does not leave dirty file systems behind. (Thanks to Fabiano Fidencio has his colleagues from ProFUSION for this).
systemd now includes a minmalistic readahead implementation, based on fanotify(), fadvise() and mincore(). It supports btrfs defragmentation and both SSD and HDD disks. While the effect on boots that are anyway fast (such as most stuff involving SSD) is minimal, slower and older machines benefit from this more substantially.
We now control fsck and quota during early boot with a C tool that ensure maximum parallelization but properly implements the necessary high-level administration logic.
Every service, every user and every user session now gets its own cgroup in the 'cpu' hierarchy thus creating better fairness between the logged in users and their sessions.
We now provide /dev/log logging from early boot to late shutdown. If no syslog daemon is running the output is passed on to kmsg. As soon as a proper syslog daemon starts up the kmsg buffer is flushed to syslog, and hence we will have complete log coverage in syslog even for early boot.
systemctl kill was introduced, an easy command to send a signal to all processes of a service. Expect a blog story with more details about this shortly.
systemd gained the ability to load the SELinux policy if necessary, thus supporting non-initrd boots and initrd boots from the same binary with no duplicate work. This is in fact (and surprisingly) a first among Linux init systems.
We now initialize and set the system locale inside PID 1 to be inherited by all services and users.
systemd has native support for /etc/crypttab and can activate encrypted LUKS/dm-crypt disks both at boot-up and during runtime. A minimal password querying infrastructure is available, where multiple agents can be used to present the password to the user. During boot the password is queried either via Plymouth or directly on the console. If a system crypto disk is plugged in after boot you are queried for the password via a GNOME agent, or a wall(1) agent. Finally, while you run systemctl start (or a similar command) a minimal TTY password agent is available which asks you for passwords right-away if this is necessary. The password querying logic is very simple, additional agents can be implemented in a trivial amount of code (Yupp, KDE folks, you can add an agent for this, too). Note that the password querying logic in systemd is only for non-user passwords, i.e. passwords that have no relation to a specific user, but rather to specific hardware or system software. In future we hope to extend this so that this can be used to query the password of SSL certificates when Apache or other servers start.
We offer a minimal interface that external projects can use to extend the dependency graph systemd manages. In fact, the cryptsetup logic mentioned above is implemented via this 'plugin'-like system. Since we did not want to add code that deals with cryptographic disks into the systemd process itself we introduced this interface (after all cryptographic volumes are not an essential feature of a minimal OS, and unncessary on most embedded uses; also the future might bring us STC which might make this at least partially obsolete). Simply by dropping a generator binary into /lib/systemd/system-generators which should write out systemd unit files into a temporary directory third-party packages may extend the systemd dependency tree dynamically. This could be useful for example to automatically create a systemd service for each KVM machine or LXC container. With that in place those containers/machines could be managed and supervised with the same tools as the usual system services.
We integrated automatic clean-up of directories such as /tmp into the tmpfiles logic we already had in place that recreates files and directories on volatile file systems such as /var/run, /var/lock or /tmp.
We now always measure and write to the log files the system startup time we measured, broken up into how many time was spent on the kernel, the initrd and the initialization of userspace.
We now safely destroy all user session before going down. This is a feature long missing on Linux: since user processes were not killed until the very last moment the unhealthy situation that user code was running at a time where no other daemon was remaining was a normal part of shutdown.
systemd now understands an 'extreme' form of disabling a service: if you symlink a service name in /etc/systemd/system to /dev/null then systemd will mark it as masked and completely refuse starting it, regardless if this is requested manually or automaticallly. Normally it should be sufficient to simply call systemctl disable to disable a service which still allows manual activation but no automatic activation. Masking a service goes one step further.
There's now a simple condition syntax in places which allows skipping or enabling units depending on the existance of a file, whether a directory is empty or whether a kernel command line option is set.
In addition to normal shutdowns for reboot, halt or poweroff we now similarly support a kexec reboot, that reboots the machine without going though the BIOS code again.
We have bash completion support for systemctl. (Ran Benita)
Andrew Edmunds contributed basic support to boot Ubuntu with systemd.
Michael Biebl and Tollef Fog Heen have worked on the systemd integration into Debian to a level that it is now possible to boot a system without having the old initscripts packaged installed. For more details see the Debian Wiki. Michael even tested this integration on an Ubuntu Natty system and as it turns out this works almost equally well on Ubuntu already. If you are interesting in playing around with this, ping Michael.

And that's it for now. There's a lot of other stuff in the git commits, but most of it is smaller and I will it thus spare you.

We have come quite far in the last year. systemd is about a year old now, and we are now able to boot a system without legacy shell scripts remaining, something that appeared to be a task for the distant future.

All of this is available in systemd 13 and in F15/Rawhide as I type this. If you want to play around with this then consider installing Rawhide (it's fun!).

27C3 Fudfest

I really wonder why on earth the 27C3 accepted a nonsensical paper like this into their programme. So .. stupid. You read half the proposal and it's already kinda obvious that the presenter has no idea what he is talking of. Fundamental errors, obvious misinterpretations, outdated issues: this is just FUD.

And apparently this talk even is anonymous? Such a coward! FUDing around anonymously is acceptable at the CCC?

Linux Plumbers Conference/Gnome Summit Recap

Last week LPC and GS 2010 took place in Cambridge, MA. Like the last years, LPC showed again that -- at least for me -- it is one of the most relevant Linux conferences in existence, if not the single most relevant one.

Here's a terse, incomprehensive report of the different discussions I took part in with various folks at the conference, in no particular order:

The Boot and Init track led by Kay Sievers (Suse) was a great success. We had exciting talks which I think helped quite a bit in clearing a few things up, and hopefully helps us in consolidating the full Linux boot process among all the components involved. We had talks covering everything from the BIOS boot, to initrds, graphical boot splashes and systemd. Kay Sievers and I spoke about systemd, also covering the state of it in the Fedora and openSUSE distributions. Gustavo Barbieri (ProFUSION, Gentoo) and Michael Biebl (Debian) gave interesting talks about systemd adoption in their respective distributions. I was particularly interested in the various statistics Michael showed about SysV/LSB init script usage in Debian, because this gives an idea how much work we have in front of us in the long run. A longer discussion about the future of initrds and the logic necessary to find the root file system on boot was quite enlightening. I think this track was helpful to increase the unification and consolidation of the way Linux systems boot up and are maintained during runtime.

Kay and I and some other folks sat down with Arjan van de Ven (Intel), to talk about the prospects of systemd in Meego. The discussions were very positive. In particular Arjan hat some great suggestions regarding use of the Simple Boot Flag in systemd (expect this in one of the next versions) and readahead. Before systemd can find adoption in Meego we'd have to add a short number of features to systemd first, most of them should be easy to add.

Similarly, I sat down with Martin Pitt and James Hunt (both Canonical) and discussed systemd in relation to Ubuntu. I think we managed to clear a lot of things up, and have a good chance to improve cooperation between Ubuntu and systemd in relation to APIs and maybe even more.

We talked to Thomas Gleixner regarding userspace notifications when the wallclock time jumps relative to the monotonic clock. This is important to systemd so that we can schedule calendar jobs similar to cron, but without having to wake up periodically to check whether the wallclock time changed relatively to the monotonic clock so that we can recalculate the next point in time a calendar event is triggered. There has been previous work in this area in the kernel world, but nothing got merged. Thomas' suggestion how to add this facility should be much easier than anything proposed so far.

I also tried to talk Andreas Grünbacher into supporting file system user extended attributes in various virtual file systems such as procfs, cgroupfs, sysfs and tmpfs. I hope I convinced him that this would be a good idea, since this would allow setting externally accessible attributes to all kinds of kernel objects, such as processes and devices. This would not only have uses in systemd (where we could easily store all meta information systemd needs to know about a service in the cgroupfs via xattrs, so that systemd could even crash or go away at any time and we still can read all runtime information necessary beyond mere cgrouping from the file system when systemd comes to live again) but also in the desktop environments, so that we could for example attach the human readable application name, an icon or a desktop file to the processes currently running, in a simple way where the data we attach follows the lifecycle of the process itself.

The Audio track went really well, too. I was particularly excited about Pierre-Louis Bossart's (Intel) plans regarding AC3 (and other codecs) support in PulseAudio, and the simplicity of his approach. Also great was hearing about Laurent Pinchart's project to expose audio and video device routing to userspace. Finally, I really enjoyed David Henningsson's and Luke Yelavich's (both Canonical) talk regarding tracking down audio bugs on Ubuntu. I was really impressed by the elaborate tools they created to test audio drivers on users machines. Pretty cool stuff. Maybe this can be extended into a test suite for driver writers, because the current approach for driver writers (i.e. "If PulseAudio works correctly, your driver is correct") doesn't really scale (although I like the idea and take it as a compliment...). I also liked the timechart profiling results Pierre showed me that he generated for PulseAudio. Seems PulseAudio is behaving quite nicely these days.

Together with Harald Hoyer I got a demo of David Zeuthen's disk assembly daemon (stc), which makes RAID/MD/LVM assembly more dynamic. Great stuff, and I think we convinced him to leave actual mounting of file systems to systemd instead of doing it himself.

Harald and I also hashed out a few things to make integration between dracut and systemd nicer (i.e. passing along profiling information between the two, and information regarding the root fsck).

I also hope I convinced Ray Strode to make Plymouth actively listen to udev for notifications about DRM devices, so that further synchronization between udev and plymouth won't be necessary, which both makes things more robust and a little bit faster.

Kay and I talked to Greg Kroah-Hartman regarding the brokeness of VT_WAITEVENT in kernel TTY layer, and discussed what to do about this. After returning from the US Kay now did the necessary hacking work to provide a minimal sysfs based solution that allows userspace query to which TTYs /dev/console and /dev/tty0 currently point, and get notifications when this changes. This should allow us to greatly simplify ConsoleKit and make it possible to add console-triggered activation to systemd (think: getty gets started the moment you switch to its virtual terminal, not already at boot).

I also spent some time discussing the upcoming deadline scheduling kernel logic with Dario, Dhaval and Tommaso regarding its possible use in PulseAudio. I believe deadline schedule is a useful tool to hand out real-time scheduling to applications securely. As an easy path to supporting deadline scheduling in PulseAudio I suggested patching RealtimeKit to optionally use deadline scheduling for its clients. This would magically teach PA (and other clients) to use deadline scheduling without further patching in the clients.

At GNOME Summit I sat down with Ryan Lortie and Will Thompson to discuss the the future of the D-Bus session bus and how we can move to a machine/user bus instead in a nice way. We managed to come to a nice agreement here, and this should enable us to introduce systemd for session management soonishly. Now we only need to convince the other folks having stakes in D-Bus that what we discussed is actually a good idea, expect more about this soon on dbus-devel. Ryan and I also hashed out our remaining differences regarding the exact semantics of XDG_RUNTIME_DIR, the result of which you can already see on the XDG mailing list. Ryan already did the GLib work to introduce XDG_RUNTIME_DIR and systemd already supports this inofficially since a few versions.

I quite appreciate how Michael Meeks quoted me in his final keynote. ;-)

There was a lot of other stuff going on at the conference, and what I wrote above is in no way complete. And of course, besides all the technical stuff, it was great meeting all the good Linux folks again, especially my colleagues from Red Hat.

I am still amazed how systemd is received so positively and with open arms all across the board. It's particularly amazing that systemd at this point in time has already been adopted by various companies in the automotive and aviation industry.

Off to LPC 2010, Boston

Later this week the Linux Plumbers Conference 2010 will take place at the Hyatt Regency in Cambridge.

Together with Mark Brown I'll be running the conference track about Audio, and I believe we managed to put together quite a nice schedule with various interesting talks covering many areas of what Audio on Linux is about.

I'll also be around at the Boot and Init Systems track which Kay Sievers is running. Together with Kay I'll do a session about systemd, everybody's favourite system and session manager. We also managed to convince a number of distribution maintainers of systemd to do short presentations about the state of systemd adoption in their respective distributions: Michael Biebl from Debian, Gustavo Barbieri from Gentoo, Kay for openSUSE and yours truly for Fedora.

Because there never can be enough systemd coverage at a conference I'll do another talk about systemd, in Vincent Untz' Desktop track, this time focussing less on how to boot and maintain a system, but more on doing the same for desktop sessions, in particular GNOME.

I'll also stick around for the the first two days of the GNOME Boston Summit.

See you in Cambridge!