Category: projects

systemd for Administrators, Part XVI

And, yes, here's now the sixteenth installment of my ongoing series on systemd for Administrators:

Gettys on Serial Consoles (and Elsewhere)

TL;DR: To make use of a serial console, just use console=ttyS0 on the kernel command line, and systemd will automatically start a getty on it for you.

While physical RS232 serial ports have become exotic in today's PCs they play an important role in modern servers and embedded hardware. They provide a relatively robust and minimalistic way to access the console of your device, that works even when the network is hosed, or the primary UI is unresponsive. VMs frequently emulate a serial port as well.

Of course, Linux has always had good support for serial consoles, but with systemd we tried to make serial console support even simpler to use. In the following text I'll try to give an overview how serial console gettys on systemd work, and how TTYs of any kind are handled.

Let's start with the key take-away: in most cases, to get a login prompt on your serial prompt you don't need to do anything. systemd checks the kernel configuration for the selected kernel console and will simply spawn a serial getty on it. That way it is entirely sufficient to configure your kernel console properly (for example, by adding console=ttyS0 to the kernel command line) and that's it. But let's have a look at the details:

In systemd, two template units are responsible for bringing up a login prompt on text consoles:

  1. getty@.service is responsible for virtual terminal (VT) login prompts, i.e. those on your VGA screen as exposed in /dev/tty1 and similar devices.
  2. serial-getty@.service is responsible for all other terminals, including serial ports such as /dev/ttyS0. It differs in a couple of ways from getty@.service: among other things the $TERM environment variable is set to vt102 (hopefully a good default for most serial terminals) rather than linux (which is the right choice for VTs only), and a special logic that clears the VT scrollback buffer (and only work on VTs) is skipped.
Virtual Terminals

Let's have a closer look how getty@.service is started, i.e. how login prompts on the virtual terminal (i.e. non-serial TTYs) work. Traditionally, the init system on Linux machines was configured to spawn a fixed number login prompts at boot. In most cases six instances of the getty program were spawned, on the first six VTs, tty1 to tty6.

In a systemd world we made this more dynamic: in order to make things more efficient login prompts are now started on demand only. As you switch to the VTs the getty service is instantiated to getty@tty2.service, getty@tty5.service and so on. Since we don't have to unconditionally start the getty processes anymore this allows us to save a bit of resources, and makes start-up a bit faster. This behaviour is mostly transparent to the user: if the user activates a VT the getty is started right-away, so that the user will hardly notice that it wasn't running all the time. If he then logs in and types ps he'll notice however that getty instances are only running for the VTs he so far switched to.

By default this automatic spawning is done for the VTs up to VT6 only (in order to be close to the traditional default configuration of Linux systems)[1]. Note that the auto-spawning of gettys is only attempted if no other subsystem took possession of the VTs yet. More specifically, if a user makes frequent use of fast user switching via GNOME he'll get his X sessions on the first six VTs, too, since the lowest available VT is allocated for each session.

Two VTs are handled specially by the auto-spawning logic: firstly tty1 gets special treatment: if we boot into graphical mode the display manager takes possession of this VT. If we boot into multi-user (text) mode a getty is started on it -- unconditionally, without any on-demand logic[2].

Secondly, tty6 is especially reserved for auto-spawned gettys and unavailable to other subsystems such as X[3]. This is done in order to ensure that there's always a way to get a text login, even if due to fast user switching X took possession of more than 5 VTs.

Serial Terminals

Handling of login prompts on serial terminals (and all other kind of non-VT terminals) is different from that of VTs. By default systemd will instantiate one serial-getty@.service on the main kernel[4] console, if it is not a virtual terminal. The kernel console is where the kernel outputs its own log messages and is usually configured on the kernel command line in the boot loader via an argument such as console=ttyS0[5]. This logic ensures that when the user asks the kernel to redirect its output onto a certain serial terminal, he will automatically also get a login prompt on it as the boot completes[6]. systemd will also spawn a login prompt on the first special VM console (that's /dev/hvc0, /dev/xvc0, /dev/hvsi0), if the system is run in a VM that provides these devices. This logic is implemented in a generator called systemd-getty-generator that is run early at boot and pulls in the necessary services depending on the execution environment.

In many cases, this automatic logic should already suffice to get you a login prompt when you need one, without any specific configuration of systemd. However, sometimes there's the need to manually configure a serial getty, for example, if more than one serial login prompt is needed or the kernel console should be redirected to a different terminal than the login prompt. To facilitate this it is sufficient to instantiate serial-getty@.service once for each serial port you want it to run on[7]:

# systemctl enable serial-getty@ttyS2.service
# systemctl start serial-getty@ttyS2.service

And that's it. This will make sure you get the login prompt on the chosen port on all subsequent boots, and starts it right-away too.

Sometimes, there's the need to configure the login prompt in even more detail. For example, if the default baud rate configured by the kernel is not correct or other agetty parameters need to be changed. In such a case simply copy the default unit template to /etc/systemd/system and edit it there:

# cp /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/serial-getty@ttyS2.service
# vi /etc/systemd/system/serial-getty@ttyS2.service
 .... now make your changes to the agetty command line ...
# ln -s /etc/systemd/system/serial-getty@ttyS2.service /etc/systemd/system/getty.target.wants/
# systemctl daemon-reload
# systemctl start serial-getty@ttyS2.service

This creates a unit file that is specific to serial port ttyS2, so that you can make specific changes to this port and this port only.

And this is pretty much all there's to say about serial ports, VTs and login prompts on them. I hope this was interesting, and please come back soon for the next installment of this series!

Footnotes

[1] You can easily modify this by changing NAutoVTs= in logind.conf.

[2] Note that whether the getty on VT1 is started on-demand or not hardly makes a difference, since VT1 is the default active VT anyway, so the demand is there anyway at boot.

[3] You can easily change this special reserved VT by modifying ReserveVT= in logind.conf.

[4] If multiple kernel consoles are used simultaneously, the main console is the one listed first in /sys/class/tty/console/active, which is the last one listed on the kernel command line.

[5] See kernel-parameters.txt for more information on this kernel command line option.

[6] Note that agetty -s is used here so that the baud rate configured at the kernel command line is not altered and continued to be used by the login prompt.

[7] Note that this systemctl enable syntax only works with systemd 188 and newer (i.e. F18). On older versions use ln -s /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/getty.target.wants/serial-getty@ttyS2.service ; systemctl daemon-reload instead.


Berlin Open Source Meetup

Prater

Chris Kühl and I are organizing a Berlin Open Source Meetup on Aug 19th at the Prater Biergarten in Prenzlauer Berg. If you live in Berlin (or are passing by) and are involved in or interested in Open Source then you are invited!

There's also a Google+ event for the meetup.

It's a public event, so everybody is welcome, and please feel free to invite others!

See you at the Prater!


foss.in 2012 CFP Ends in a Few Hours

foss.in 2012 in Bangalore takes place again after a hiatus of some years. It has always been a fantastic conference, and a great opportunity to visit Bangalore and India. I just submitted my talk proposals, so, hurry up, and submit yours!


systemd for Administrators, Part XV

Quickly following the previous iteration, here's now the fifteenth installment of my ongoing series on systemd for Administrators:

Watchdogs

There are three big target audiences we try to cover with systemd: the embedded/mobile folks, the desktop people and the server folks. While the systems used by embedded/mobile tend to be underpowered and have few resources are available, desktops tend to be much more powerful machines -- but still much less resourceful than servers. Nonetheless there are surprisingly many features that matter to both extremes of this axis (embedded and servers), but not the center (desktops). On of them is support for watchdogs in hardware and software.

Embedded devices frequently rely on watchdog hardware that resets it automatically if software stops responding (more specifically, stops signalling the hardware in fixed intervals that it is still alive). This is required to increase reliability and make sure that regardless what happens the best is attempted to get the system working again. Functionality like this makes little sense on the desktop[1]. However, on high-availability servers watchdogs are frequently used, again.

Starting with version 183 systemd provides full support for hardware watchdogs (as exposed in /dev/watchdog to userspace), as well as supervisor (software) watchdog support for invidual system services. The basic idea is the following: if enabled, systemd will regularly ping the watchdog hardware. If systemd or the kernel hang this ping will not happen anymore and the hardware will automatically reset the system. This way systemd and the kernel are protected from boundless hangs -- by the hardware. To make the chain complete, systemd then exposes a software watchdog interface for individual services so that they can also be restarted (or some other action taken) if they begin to hang. This software watchdog logic can be configured individually for each service in the ping frequency and the action to take. Putting both parts together (i.e. hardware watchdogs supervising systemd and the kernel, as well as systemd supervising all other services) we have a reliable way to watchdog every single component of the system.

To make use of the hardware watchdog it is sufficient to set the RuntimeWatchdogSec= option in /etc/systemd/system.conf. It defaults to 0 (i.e. no hardware watchdog use). Set it to a value like 20s and the watchdog is enabled. After 20s of no keep-alive pings the hardware will reset itself. Note that systemd will send a ping to the hardware at half the specified interval, i.e. every 10s. And that's already all there is to it. By enabling this single, simple option you have turned on supervision by the hardware of systemd and the kernel beneath it.[2]

Note that the hardware watchdog device (/dev/watchdog) is single-user only. That means that you can either enable this functionality in systemd, or use a separate external watchdog daemon, such as the aptly named watchdog.

ShutdownWatchdogSec= is another option that can be configured in /etc/systemd/system.conf. It controls the watchdog interval to use during reboots. It defaults to 10min, and adds extra reliability to the system reboot logic: if a clean reboot is not possible and shutdown hangs, we rely on the watchdog hardware to reset the system abruptly, as extra safety net.

So much about the hardware watchdog logic. These two options are really everything that is necessary to make use of the hardware watchdogs. Now, let's have a look how to add watchdog logic to individual services.

First of all, to make software watchdog-supervisable it needs to be patched to send out "I am alive" signals in regular intervals in its event loop. Patching this is relatively easy. First, a daemon needs to read the WATCHDOG_USEC= environment variable. If it is set, it will contain the watchdog interval in usec formatted as ASCII text string, as it is configured for the service. The daemon should then issue sd_notify("WATCHDOG=1") calls every half of that interval. A daemon patched this way should transparently support watchdog functionality by checking whether the environment variable is set and honouring the value it is set to.

To enable the software watchdog logic for a service (which has been patched to support the logic pointed out above) it is sufficient to set the WatchdogSec= to the desired failure latency. See systemd.service(5) for details on this setting. This causes WATCHDOG_USEC= to be set for the service's processes and will cause the service to enter a failure state as soon as no keep-alive ping is received within the configured interval.

If a service enters a failure state as soon as the watchdog logic detects a hang, then this is hardly sufficient to build a reliable system. The next step is to configure whether the service shall be restarted and how often, and what to do if it then still fails. To enable automatic service restarts on failure set Restart=on-failure for the service. To configure how many times a service shall be attempted to be restarted use the combination of StartLimitBurst= and StartLimitInterval= which allow you to configure how often a service may restart within a time interval. If that limit is reached, a special action can be taken. This action is configured with StartLimitAction=. The default is a none, i.e. that no further action is taken and the service simply remains in the failure state without any further attempted restarts. The other three possible values are reboot, reboot-force and reboot-immediate. reboot attempts a clean reboot, going through the usual, clean shutdown logic. reboot-force is more abrupt: it will not actually try to cleanly shutdown any services, but immediately kills all remaining services and unmounts all file systems and then forcibly reboots (this way all file systems will be clean but reboot will still be very fast). Finally, reboot-immediate does not attempt to kill any process or unmount any file systems. Instead it just hard reboots the machine without delay. reboot-immediate hence comes closest to a reboot triggered by a hardware watchdog. All these settings are documented in systemd.service(5).

Putting this all together we now have pretty flexible options to watchdog-supervise a specific service and configure automatic restarts of the service if it hangs, plus take ultimate action if that doesn't help.

Here's an example unit file:

[Unit]
Description=My Little Daemon
Documentation=man:mylittled(8)

[Service]
ExecStart=/usr/bin/mylittled
WatchdogSec=30s
Restart=on-failure
StartLimitInterval=5min
StartLimitBurst=4
StartLimitAction=reboot-force

This service will automatically be restarted if it hasn't pinged the system manager for longer than 30s or if it fails otherwise. If it is restarted this way more often than 4 times in 5min action is taken and the system quickly rebooted, with all file systems being clean when it comes up again.

And that's already all I wanted to tell you about! With hardware watchdog support right in PID 1, as well as supervisor watchdog support for individual services we should provide everything you need for most watchdog usecases. Regardless if you are building an embedded or mobile applience, or if your are working with high-availability servers, please give this a try!

(Oh, and if you wonder why in heaven PID 1 needs to deal with /dev/watchdog, and why this shouldn't be kept in a separate daemon, then please read this again and try to understand that this is all about the supervisor chain we are building here, where the hardware watchdog supervises systemd, and systemd supervises the individual services. Also, we believe that a service not responding should be treated in a similar way as any other service error. Finally, pinging /dev/watchdog is one of the most trivial operations in the OS (basically little more than a ioctl() call), to the support for this is not more than a handful lines of code. Maintaining this externally with complex IPC between PID 1 (and the daemons) and this watchdog daemon would be drastically more complex, error-prone and resource intensive.)

Note that the built-in hardware watchdog support of systemd does not conflict with other watchdog software by default. systemd does not make use of /dev/watchdog by default, and you are welcome to use external watchdog daemons in conjunction with systemd, if this better suits your needs.

And one last thing: if you wonder whether your hardware has a watchdog, then the answer is: almost definitely yes -- if it is anything more recent than a few years. If you want to verify this, try the wdctl tool from recent util-linux, which shows you everything you need to know about your watchdog hardware.

I'd like to thank the great folks from Pengutronix for contributing most of the watchdog logic. Thank you!

Footnotes

[1] Though actually most desktops tend to include watchdog hardware these days too, as this is cheap to build and available in most modern PC chipsets.

[2] So, here's a free tip for you if you hack on the core OS: don't enable this feature while you hack. Otherwise your system might suddenly reboot if you are in the middle of tracing through PID 1 with gdb and cause it to be stopped for a moment, so that no hardware ping can be done...


systemd for Administrators, Part XIV

And here's the fourteenth installment of my ongoing series on systemd for Administrators:

The Self-Explanatory Boot

One complaint we often hear about systemd is that its boot process was hard to understand, even incomprehensible. In general I can only disagree with this sentiment, I even believe in quite the opposite: in comparison to what we had before -- where to even remotely understand what was going on you had to have a decent comprehension of the programming language that is Bourne Shell[1] -- understanding systemd's boot process is substantially easier. However, like in many complaints there is some truth in this frequently heard discomfort: for a seasoned Unix administrator there indeed is a bit of learning to do when the switch to systemd is made. And as systemd developers it is our duty to make the learning curve shallow, introduce as few surprises as we can, and provide good documentation where that is not possible.

systemd always had huge body of documentation as manual pages (nearly 100 individual pages now!), in the Wiki and the various blog stories I posted. However, any amount of documentation alone is not enough to make software easily understood. In fact, thick manuals sometimes appear intimidating and make the reader wonder where to start reading, if all he was interested in was this one simple concept of the whole system.

Acknowledging all this we have now added a new, neat, little feature to systemd: the self-explanatory boot process. What do we mean by that? Simply that each and every single component of our boot comes with documentation and that this documentation is closely linked to its component, so that it is easy to find.

More specifically, all units in systemd (which are what encapsulate the components of the boot) now include references to their documentation, the documentation of their configuration files and further applicable manuals. A user who is trying to understand the purpose of a unit, how it fits into the boot process and how to configure it can now easily look up this documentation with the well-known systemctl status command. Here's an example how this looks for systemd-logind.service:

$ systemctl status systemd-logind.service
systemd-logind.service - Login Service
	  Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static)
	  Active: active (running) since Mon, 25 Jun 2012 22:39:24 +0200; 1 day and 18h ago
	    Docs: man:systemd-logind.service(7)
	          man:logind.conf(5)
	          http://www.freedesktop.org/wiki/Software/systemd/multiseat
	Main PID: 562 (systemd-logind)
	  CGroup: name=systemd:/system/systemd-logind.service
		  └ 562 /usr/lib/systemd/systemd-logind

Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event2 (Power Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event6 (Video Bus)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event0 (Lid Switch)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event1 (Sleep Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event7 (ThinkPad Extra Buttons)
Jun 25 22:39:25 epsilon systemd-logind[562]: New session 1 of user gdm.
Jun 25 22:39:25 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/42/X11-display.
Jun 25 22:39:32 epsilon systemd-logind[562]: New session 2 of user lennart.
Jun 25 22:39:32 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/500/X11-display.
Jun 25 22:39:54 epsilon systemd-logind[562]: Removed session 1.

On the first look this output changed very little. If you look closer however you will find that it now includes one new field: Docs lists references to the documentation of this service. In this case there are two man page URIs and one web URL specified. The man pages describe the purpose and configuration of this service, the web URL includes an introduction to the basic concepts of this service.

If the user uses a recent graphical terminal implementation it is sufficient to click on the URIs shown to get the respective documentation[2]. With other words: it never has been that easy to figure out what a specific component of our boot is about: just use systemctl status to get more information about it and click on the links shown to find the documentation.

The past days I have written man pages and added these references for every single unit we ship with systemd. This means, with systemctl status you now have a very easy way to find out more about every single service of the core OS.

If you are not using a graphical terminal (where you can just click on URIs), a man page URI in the middle of the output of systemctl status is not the most useful thing to have. To make reading the referenced man pages easier we have also added a new command:

systemctl help systemd-logind.service

Which will open the listed man pages right-away, without the need to click anything or copy/paste an URI.

The URIs are in the formats documented by the uri(7) man page. Units may reference http and https URLs, as well as man and info pages.

Of course all this doesn't make everything self-explanatory, simply because the user still has to find out about systemctl status (and even systemctl in the first place so that he even knows what units there are); however with this basic knowledge further help on specific units is in very easy reach.

We hope that this kind of interlinking of runtime behaviour and the matching documentation is a big step forward to make our boot easier to understand.

This functionality is partially already available in Fedora 17, and will show up in complete form in Fedora 18.

That all said, credit where credit is due: this kind of references to documentation within the service descriptions is not new, Solaris' SMF had similar functionality for quite some time. However, we believe this new systemd feature is certainly a novelty on Linux, and with systemd we now offer you the best documented and best self-explaining init system.

Of course, if you are writing unit files for your own packages, please consider also including references to the documentation of your services and its configuration. This is really easy to do, just list the URIs in the new Documentation= field in the [Unit] section of your unit files. For details see systemd.unit(5). The more comprehensively we include links to documentation in our OS services the easier the work of administrators becomes. (To make sure Fedora makes comprehensive use of this functionality I filed a bug on FPC).

Oh, and BTW: if you are looking for a rough overview of systemd's boot process here's another new man page we recently added, which includes a pretty ASCII flow chart of the boot process and the units involved.

Footnotes

[1] Which TBH is a pretty crufty, strange one on top.

[2] Well, a terminal where this bug is fixed (used together with a help browser where this one is fixed).


Presentation in Warsaw

I recently had the chance to speak about systemd and other projects, as well as the politics behind them at a Bar Camp in Warsaw, organized by the fine people of OSEC. The presentation has been recorded, and has now been posted online. It's a very long recording (1:43h), but it's quite interesting (as I'd like to believe) and contains a bit of background where we are coming from and where are going to. Anyway, please have a look. Enjoy!

I'd like to thank the organizers for this great event and for publishing the recording online.


systemd for Administrators, Part XIII

Here's the thirteenth installment of my ongoing series on systemd for Administrators:

Log and Service Status

This one is a short episode. One of the most commonly used commands on a systemd system is systemctl status which may be used to determine the status of a service (or other unit). It always has been a valuable tool to figure out the processes, runtime information and other meta data of a daemon running on the system.

With Fedora 17 we introduced the journal, our new logging scheme that provides structured, indexed and reliable logging on systemd systems, while providing a certain degree of compatibility with classic syslog implementations. The original reason we started to work on the journal was one specific feature idea, that to the outsider might appear simple but without the journal is difficult and inefficient to implement: along with the output of systemctl status we wanted to show the last 10 log messages of the daemon. Log data is some of the most essential bits of information we have on the status of a service. Hence it it is an obvious choice to show next to the general status of the service.

And now to make it short: at the same time as we integrated the journal into systemd and Fedora we also hooked up systemctl with it. Here's an example output:

$ systemctl status avahi-daemon.service
avahi-daemon.service - Avahi mDNS/DNS-SD Stack
	  Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled)
	  Active: active (running) since Fri, 18 May 2012 12:27:37 +0200; 14s ago
	Main PID: 8216 (avahi-daemon)
	  Status: "avahi-daemon 0.6.30 starting up."
	  CGroup: name=systemd:/system/avahi-daemon.service
		  ├ 8216 avahi-daemon: running [omega.local]
		  └ 8217 avahi-daemon: chroot helper

May 18 12:27:37 omega avahi-daemon[8216]: Joining mDNS multicast group on interface eth1.IPv4 with address 172.31.0.52.
May 18 12:27:37 omega avahi-daemon[8216]: New relevant interface eth1.IPv4 for mDNS.
May 18 12:27:37 omega avahi-daemon[8216]: Network interface enumeration completed.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for fd00::e269:95ff:fe87:e282 on eth1.*.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 172.31.0.52 on eth1.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering HINFO record with values 'X86_64'/'LINUX'.
May 18 12:27:38 omega avahi-daemon[8216]: Server startup complete. Host name is omega.local. Local service cookie is 3555095952.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/ssh.service) successfully established.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/sftp-ssh.service) successfully established.

This, of course, shows the status of everybody's favourite mDNS/DNS-SD daemon with a list of its processes, along with -- as promised -- the 10 most recent log lines. Mission accomplished!

There are a couple of switches available to alter the output slightly and adjust it to your needs. The two most interesting switches are -f to enable follow mode (as in tail -f) and -n to change the number of lines to show (you guessed it, as in tail -n).

The log data shown comes from three sources: everything any of the daemon's processes logged with libc's syslog() call, everything submitted using the native Journal API, plus everything any of the daemon's processes logged to STDOUT or STDERR. In short: everything the daemon generates as log data is collected, properly interleaved and shown in the same format.

And that's it already for today. It's a very simple feature, but an immensely useful one for every administrator. One of the kind "Why didn't we already do this 15 years ago?".

Stay tuned for the next installment!


Boot & Base OS Miniconf at Linux Plumbers Conference 2012, San Diego

Linux Plumbers Conference Logo

We are working on putting together a miniconf on the topic of Boot & Base OS for the Linux Plumbers Conference 2012 in San Diego (Aug 29-31). And we need your submission!

Are you working on some exciting project related to Boot and Base OS and would like to present your work? Then please submit something following these guidelines, but please CC Kay Sievers and Lennart Poettering.

I hope that at this point the Linux Plumbers Conference needs little introduction, so I will spare any further prose on how great and useful and the best conference ever it is for everybody who works on the plumbing layer of Linux. However, there's one conference that will be co-located with LPC that is still little known, because it happens for the first time: The C Conference, organized by Brandon Philips and friends. It covers all things C, and they are still looking for more topics, in a reverse CFP. Please consider submitting a proposal and registering to the conference!

C
Conference Logo


The Most Awesome, Least-Advertised Fedora 17 Feature

There's one feature In the upcoming Fedora 17 release that is immensly useful but very little known, since its feature page 'ckremoval' does not explicitly refer to it in its name: true automatic multi-seat support for Linux.

A multi-seat computer is a system that offers not only one local seat for a user, but multiple, at the same time. A seat refers to a combination of a screen, a set of input devices (such as mice and keyboards), and maybe an audio card or webcam, as individual local workplace for a user. A multi-seat computer can drive an entire class room of seats with only a fraction of the cost in hardware, energy, administration and space: you only have one PC, which usually has way enough CPU power to drive 10 or more workplaces. (In fact, even a Netbook has fast enough to drive a couple of seats!) Automatic multi-seat refers to an entirely automatically managed seat setup: whenever a new seat is plugged in a new login screen immediately appears -- without any manual configuration --, and when the seat is unplugged all user sessions on it are removed without delay.

In Fedora 17 we added this functionality to the low-level user and device tracking of systemd, replacing the previous ConsoleKit logic that lacked support for automatic multi-seat. With all the ground work done in systemd, udev and the other components of our plumbing layer the last remaining bits were surprisingly easy to add.

Currently, the automatic multi-seat logic works best with the USB multi-seat hardware from Plugable you can buy cheaply on Amazon (US). These devices require exactly zero configuration with the new scheme implemented in Fedora 17: just plug them in at any time, login screens pop up on them, and you have your additional seats. Alternatively you can also assemble your seat manually with a few easy loginctl attach commands, from any kind of hardware you might have lying around. To get a full seat you need multiple graphics cards, keyboards and mice: one set for each seat. (Later on we'll probably have a graphical setup utility for additional seats, but that's not a pressing issue we believe, as the plug-n-play multi-seat support with the Plugable devices is so awesomely nice.)

Plugable provided us for free with hardware for testing multi-seat. They are also involved with the upstream development of the USB DisplayLink driver for Linux. Due to their positive involvement with Linux we can only recommend to buy their hardware. They are good guys, and support Free Software the way all hardware vendors should! (And besides that, their hardware is also nicely put together. For example, in contrast to most similar vendors they actually assign proper vendor/product IDs to their USB hardware so that we can easily recognize their hardware when plugged in to set up automatic seats.)

Currently, all this magic is only implemented in the GNOME stack with the biggest component getting updated being the GNOME Display Manager. On the Plugable USB hardware you get a full GNOME Shell session with all the usual graphical gimmicks, the same way as on any other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics cards such as these USB devices!) If you are hacking on a different desktop environment, or on a different display manager, please have a look at the multi-seat documentation we put together, and particularly at our short piece about writing display managers which are multi-seat capable.

If you work on a major desktop environment or display manager and would like to implement multi-seat support for it, but lack the aforementioned Plugable hardware, we might be able to provide you with the hardware for free. Please contact us directly, and we might be able to send you a device. Note that we don't have unlimited devices available, hence we'll probably not be able to pass hardware to everybody who asks, and we will pass the hardware preferably to people who work on well-known software or otherwise have contributed good code to the community already. Anyway, if in doubt, ping us, and explain to us why you should get the hardware, and we'll consider you! (Oh, and this not only applies to display managers, if you hack on some other software where multi-seat awareness would be truly useful, then don't hesitate and ping us!)

Phoronix has this story about this new multi-seat support which is quite interesting and full of pictures. Please have a look.

Plugable started a Pledge drive to lower the price of the Plugable USB multi-seat terminals further. It's full of pictures (and a video showing all this in action!), and uses the code we now make available in Fedora 17 as base. Please consider pledging a few bucks.

Recently David Zeuthen added multi-seat support to udisks as well. With this in place, a user logged in on a specific seat can only see the USB storage plugged into his individual seat, but does not see any USB storage plugged into any other local seat. With this in place we closed the last missing bit of multi-seat support in our desktop stack.

With this code in Fedora 17 we cover the big use cases of multi-seat already: internet cafes, class rooms and similar installations can provide PC workplaces cheaply and easily without any manual configuration. Later on we want to build on this and make this useful for different uses too: for example, the ability to get a login screen as easily as plugging in a USB connector makes this not useful only for saving money in setups for many people, but also in embedded environments (consider monitoring/debugging screens made available via this hotplug logic) or servers (get trivially quick local access to your otherwise head-less server). To be truly useful in these areas we need one more thing though: the ability to run a simply getty (i.e. text login) on the seat, without necessarily involving a graphical UI.

The well-known X successor Wayland already comes out of the box with multi-seat support based on this logic.

Oh, and BTW, as Ubuntu appears to be "focussing" on "clarity" in the "cloud" now ;-), and chose Upstart instead of systemd, this feature won't be available in Ubuntu any time soon. That's (one detail of) the price Ubuntu has to pay for choosing to maintain it's own (largely legacy, such as ConsoleKit) plumbing stack.

Multi-seat has a long history on Unix. Since the earliest days Unix systems could be accessed by multiple local terminals at the same time. Since then local terminal support (and hence multi-seat) gradually moved out of view in computing. The fewest machines these days have more than one seat, the concept of terminals survived almost exclusively in the context of PTYs (i.e. fully virtualized API objects, disconnected from any real hardware seat) and VCs (i.e. a single virtualized local seat), but almost not in any other way (well, server setups still use serial terminals for emergency remote access, but they almost never have more than one serial terminal). All what we do in systemd is based on the ideas originally brought forward in Unix; with systemd we now try to bring back a number of the good ideas of Unix that since the old times were lost on the roadside. For example, in true Unix style we already started to expose the concept of a service in the file system (in /sys/fs/cgroup/systemd/system/), something where on Linux the (often misunderstood) "everything is a file" mantra previously fell short. With automatic multi-seat support we bring back support for terminals, but updated with all the features of today's desktops: plug and play, zero configuration, full graphics, and not limited to input devices and screens, but extending to all kinds of devices, such as audio, webcams or USB memory sticks.

Anyway, this is all for now; I'd like to thank everybody who was involved with making multi-seat work so nicely and natively on the Linux platform. You know who you are! Thanks a ton!


systemd Status Update

It has been way too long since my last status update on systemd. Here's another short, incomprehensive status update on what we worked on for systemd since then.

We have been working hard to turn systemd into the most viable set of components to build operating systems, appliances and devices from, and make it the best choice for servers, for desktops and for embedded environments alike. I think we have a really convincing set of features now, but we are actively working on making it even better.

Here's a list of some more and some less interesting features, in no particular order:

  1. We added an automatic pager to systemctl (and related tools), similar to how git has it.
  2. systemctl learnt a new switch --failed, to show only failed services.
  3. You may now start services immediately, overrding all dependency logic by passing --ignore-dependencies to systemctl. This is mostly a debugging tool and nothing people should use in real life.
  4. Sending SIGKILL as final part of the implicit shutdown logic of services is now optional and may be configured with the SendSIGKILL= option individually for each service.
  5. We split off the Vala/Gtk tools into its own project systemd-ui.
  6. systemd-tmpfiles learnt file globbing and creating FIFO special files as well as character and block device nodes, and symlinks. It also is capable of relabelling certain directories at boot now (in the SELinux sense).
  7. Immediately before shuttding dow we will now invoke all binaries found in /lib/systemd/system-shutdown/, which is useful for debugging late shutdown.
  8. You may now globally control where STDOUT/STDERR of services goes (unless individual service configuration overrides it).
  9. There's a new ConditionVirtualization= option, that makes systemd skip a specific service if a certain virtualization technology is found or not found. Similar, we now have a new option to detect whether a certain security technology (such as SELinux) is available, called ConditionSecurity=. There's also ConditionCapability= to check whether a certain process capability is in the capability bounding set of the system. There's also a new ConditionFileIsExecutable=, ConditionPathIsMountPoint=, ConditionPathIsReadWrite=, ConditionPathIsSymbolicLink=.
  10. The file system condition directives now support globbing.
  11. Service conditions may now be "triggering" and "mandatory", meaning that they can be a necessary requirement to hold for a service to start, or simply one trigger among many.
  12. At boot time we now print warnings if: /usr is on a split-off partition but not already mounted by an initrd; if /etc/mtab is not a symlink to /proc/mounts; CONFIG_CGROUPS is not enabled in the kernel. We'll also expose this as tainted flag on the bus.
  13. You may now boot the same OS image on a bare metal machine and in Linux namespace containers and will get a clean boot in both cases. This is more complicated than it sounds since device management with udev or write access to /sys, /proc/sys or things like /dev/kmsg is not available in a container. This makes systemd a first-class choice for managing thin container setups. This is all tested with systemd's own systemd-nspawn tool but should work fine in LXC setups, too. Basically this means that you do not have to adjust your OS manually to make it work in a container environment, but will just work out of the box. It also makes it easier to convert real systems into containers.
  14. We now automatically spawn gettys on HVC ttys when booting in VMs.
  15. We introduced /etc/machine-id as a generalization of D-Bus machine ID logic. See this blog story for more information. On stateless/read-only systems the machine ID is initialized randomly at boot. In virtualized environments it may be passed in from the machine manager (with qemu's -uuid switch, or via the container interface).
  16. All of the systemd-specific /etc/fstab mount options are now in the x-systemd-xyz format.
  17. To make it easy to find non-converted services we will now implicitly prefix all LSB and SysV init script descriptions with the strings "LSB:" resp. "SYSV:".
  18. We introduced /run and made it a hard dependency of systemd. This directory is now widely accepted and implemented on all relevant Linux distributions.
  19. systemctl can now execute all its operations remotely too (-H switch).
  20. We now ship systemd-nspawn, a really powerful tool that can be used to start containers for debugging, building and testing, much like chroot(1). It is useful to just get a shell inside a build tree, but is good enough to boot up a full system in it, too.
  21. If we query the user for a hard disk password at boot he may hit TAB to hide the asterisks we normally show for each key that is entered, for extra paranoia.
  22. We don't enable udev-settle.service anymore, which is only required for certain legacy software that still hasn't been updated to follow devices coming and going cleanly.
  23. We now include a tool that can plot boot speed graphs, similar to bootchartd, called systemd-analyze.
  24. At boot, we now initialize the kernel's binfmt_misc logic with the data from /etc/binfmt.d.
  25. systemctl now recognizes if it is run in a chroot() environment and will work accordingly (i.e. apply changes to the tree it is run in, instead of talking to the actual PID 1 for this). It also has a new --root= switch to work on an OS tree from outside of it.
  26. There's a new unit dependency type OnFailureIsolate= that allows entering a different target whenever a certain unit fails. For example, this is interesting to enter emergency mode if file system checks of crucial file systems failed.
  27. Socket units may now listen on Netlink sockets, special files from /proc and POSIX message queues, too.
  28. There's a new IgnoreOnIsolate= flag which may be used to ensure certain units are left untouched by isolation requests. There's a new IgnoreOnSnapshot= flag which may be used to exclude certain units from snapshot units when they are created.
  29. There's now small mechanism services for changing the local hostname and other host meta data, changing the system locale and console settings and the system clock.
  30. We now limit the capability bounding set for a number of our internal services by default.
  31. Plymouth may now be disabled globally with plymouth.enable=0 on the kernel command line.
  32. We now disallocate VTs when a getty finished running (and optionally other tools run on VTs). This adds extra security since it clears up the scrollback buffer so that subsequent users cannot get access to a user's session output.
  33. In socket units there are now options to control the IP_TRANSPARENT, SO_BROADCAST, SO_PASSCRED, SO_PASSSEC socket options.
  34. The receive and send buffers of socket units may now be set larger than the default system settings if needed by using SO_{RCV,SND}BUFFORCE.
  35. We now set the hardware timezone as one of the first things in PID 1, in order to avoid time jumps during normal userspace operation, and to guarantee sensible times on all generated logs. We also no longer save the system clock to the RTC on shutdown, assuming that this is done by the clock control tool when the user modifies the time, or automatically by the kernel if NTP is enabled.
  36. The SELinux directory got moved from /selinux to /sys/fs/selinux.
  37. We added a small service systemd-logind that keeps tracks of logged in users and their sessions. It creates control groups for them, implements the XDG_RUNTIME_DIR specification for them, maintains seats and device node ACLs and implements shutdown/idle inhibiting for clients. It auto-spawns gettys on all local VTs when the user switches to them (instead of starting six of them unconditionally), thus reducing the resource foot print by default. It has a D-Bus interface as well as a simple synchronous library interface. This mechanism obsoletes ConsoleKit which is now deprecated and should no longer be used.
  38. There's now full, automatic multi-seat support, and this is enabled in GNOME 3.4. Just by pluging in new seat hardware you get a new login screen on your seat's screen.
  39. There is now an option ControlGroupModify= to allow services to change the properties of their control groups dynamically, and one to make control groups persistent in the tree (ControlGroupPersistent=) so that they can be created and maintained by external tools.
  40. We now jump back into the initrd in shutdown, so that it can detach the root file system and the storage devices backing it. This allows (for the first time!) to reliably undo complex storage setups on shutdown and leave them in a clean state.
  41. systemctl now supports presets, a way for distributions and administrators to define their own policies on whether services should be enabled or disabled by default on package installation.
  42. systemctl now has high-level verbs for masking/unmasking units. There's also a new command (systemctl list-unit-files) for determining the list of all installed unit file files and whether they are enabled or not.
  43. We now apply sysctl variables to each new network device, as it appears. This makes /etc/sysctl.d compatible with hot-plug network devices.
  44. There's limited profiling for SELinux start-up perfomance built into PID 1.
  45. There's a new switch PrivateNetwork= to turn of any network access for a specific service.
  46. Service units may now include configuration for control group parameters. A few (such as MemoryLimit=) are exposed with high-level options, and all others are available via the generic ControlGroupAttribute= setting.
  47. There's now the option to mount certain cgroup controllers jointly at boot. We do this now for cpu and cpuacct by default.
  48. We added the journal and turned it on by default.
  49. All service output is now written to the Journal by default, regardless whether it is sent via syslog or simply written to stdout/stderr. Both message streams end up in the same location and are interleaved the way they should. All log messages even from the kernel and from early boot end up in the journal. Now, no service output gets unnoticed and is saved and indexed at the same location.
  50. systemctl status will now show the last 10 log lines for each service, directly from the journal.
  51. We now show the progress of fsck at boot on the console, again. We also show the much loved colorful [ OK ] status messages at boot again, as known from most SysV implementations.
  52. We merged udev into systemd.
  53. We implemented and documented interfaces to container managers and initrds for passing execution data to systemd. We also implemented and documented an interface for storage daemons that are required to back the root file system.
  54. There are two new options in service files to propagate reload requests between several units.
  55. systemd-cgls won't show kernel threads by default anymore, or show empty control groups.
  56. We added a new tool systemd-cgtop that shows resource usage of whole services in a top(1) like fasion.
  57. systemd may now supervise services in watchdog style. If enabled for a service the daemon daemon has to ping PID 1 in regular intervals or is otherwise considered failed (which might then result in restarting it, or even rebooting the machine, as configured). Also, PID 1 is capable of pinging a hardware watchdog. Putting this together, the hardware watchdogs PID 1 and PID 1 then watchdogs specific services. This is highly useful for high-availability servers as well as embedded machines. Since watchdog hardware is noawadays built into all modern chipsets (including desktop chipsets), this should hopefully help to make this a more widely used functionality.
  58. We added support for a new kernel command line option systemd.setenv= to set an environment variable system-wide.
  59. By default services which are started by systemd will have SIGPIPE set to ignored. The Unix SIGPIPE logic is used to reliably implement shell pipelines and when left enabled in services is usually just a source of bugs and problems.
  60. You may now configure the rate limiting that is applied to restarts of specific services. Previously the rate limiting parameters were hard-coded (similar to SysV).
  61. There's now support for loading the IMA integrity policy into the kernel early in PID 1, similar to how we already did it with the SELinux policy.
  62. There's now an official API to schedule and query scheduled shutdowns.
  63. We changed the license from GPL2+ to LGPL2.1+.
  64. We made systemd-detect-virt an official tool in the tool set. Since we already had code to detect certain VM and container environments we now added an official tool for administrators to make use of in shell scripts and suchlike.
  65. We documented numerous interfaces systemd introduced.

Much of the stuff above is already available in Fedora 15 and 16, or will be made available in the upcoming Fedora 17.

And that's it for now. There's a lot of other stuff in the git commits, but most of it is smaller and I will it thus spare you.

I'd like to thank everybody who contributed to systemd over the past years.

Thanks for your interest!

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .