Container Integration
Since a while containers have been one of the hot topics on Linux. Container managers such as libvirt-lxc, LXC or Docker are widely known and used these days. In this blog story I want to shed some light on systemd's integration points with container managers, to allow seamless management of services across container boundaries.
We'll focus on OS containers here, i.e. the case where an init system runs inside the container, and the container hence in most ways appears like an independent system of its own. Much of what I describe here is available on pretty much any container manager that implements the logic described here, including libvirt-lxc. However, to make things easy we'll focus on systemd-nspawn, the mini-container manager that is shipped with systemd itself. systemd-nspawn uses the same kernel interfaces as the other container managers, however is less flexible as it is designed to be a container manager that is as simple to use as possible and "just works", rather than trying to be a generic tool you can configure in every low-level detail. We use systemd-nspawn extensively when developing systemd.
Anyway, so let's get started with our run-through. Let's start by creating a Fedora container tree in a subdirectory:
# yum -y --releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal
This downloads a minimal Fedora system and installs it in in
/srv/mycontainer
. This command line is Fedora-specific, but most
distributions provide similar functionality in one way or another. The
examples section in the systemd-nspawn(1) man
page
contains a list of the various command lines for other distribution.
We now have the new container installed, let's set an initial root password:
# systemd-nspawn -D /srv/mycontainer
Spawning container mycontainer on /srv/mycontainer
Press ^] three times within 1s to kill container.
-bash-4.2# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
-bash-4.2# ^D
Container mycontainer exited successfully.
#
We use systemd-nspawn here to get a shell in the container, and then use passwd to set the root password. After that the initial setup is done, hence let's boot it up and log in as root with our new password:
$ systemd-nspawn -D /srv/mycontainer -b
Spawning container mycontainer on /srv/mycontainer.
Press ^] three times within 1s to kill container.
systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'systemd-nspawn'.
Welcome to Fedora 20 (Heisenbug)!
[ OK ] Reached target Remote File Systems.
[ OK ] Created slice Root Slice.
[ OK ] Created slice User and Session Slice.
[ OK ] Created slice System Slice.
[ OK ] Created slice system-getty.slice.
[ OK ] Reached target Slices.
[ OK ] Listening on Delayed Shutdown Socket.
[ OK ] Listening on /dev/initctl Compatibility Named Pipe.
[ OK ] Listening on Journal Socket.
Starting Journal Service...
[ OK ] Started Journal Service.
[ OK ] Reached target Paths.
Mounting Debug File System...
Mounting Configuration File System...
Mounting FUSE Control File System...
Starting Create static device nodes in /dev...
Mounting POSIX Message Queue File System...
Mounting Huge Pages File System...
[ OK ] Reached target Encrypted Volumes.
[ OK ] Reached target Swap.
Mounting Temporary Directory...
Starting Load/Save Random Seed...
[ OK ] Mounted Configuration File System.
[ OK ] Mounted FUSE Control File System.
[ OK ] Mounted Temporary Directory.
[ OK ] Mounted POSIX Message Queue File System.
[ OK ] Mounted Debug File System.
[ OK ] Mounted Huge Pages File System.
[ OK ] Started Load/Save Random Seed.
[ OK ] Started Create static device nodes in /dev.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Trigger Flushing of Journal to Persistent Storage...
Starting Recreate Volatile Files and Directories...
[ OK ] Started Recreate Volatile Files and Directories.
Starting Update UTMP about System Reboot/Shutdown...
[ OK ] Started Trigger Flushing of Journal to Persistent Storage.
[ OK ] Started Update UTMP about System Reboot/Shutdown.
[ OK ] Reached target System Initialization.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
Starting Login Service...
Starting Permit User Sessions...
Starting D-Bus System Message Bus...
[ OK ] Started D-Bus System Message Bus.
Starting Cleanup of Temporary Directories...
[ OK ] Started Cleanup of Temporary Directories.
[ OK ] Started Permit User Sessions.
Starting Console Getty...
[ OK ] Started Console Getty.
[ OK ] Reached target Login Prompts.
[ OK ] Started Login Service.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (console)
mycontainer login: root
Password:
-bash-4.2#
Now we have everything ready to play around with the container integration of systemd. Let's have a look at the first tool, machinectl. When run without parameters it shows a list of all locally running containers:
$ machinectl
MACHINE CONTAINER SERVICE
mycontainer container nspawn
1 machines listed.
The "status" subcommand shows details about the container:
$ machinectl status mycontainer
mycontainer:
Since: Mi 2014-11-12 16:47:19 CET; 51s ago
Leader: 5374 (systemd)
Service: nspawn; class container
Root: /srv/mycontainer
Address: 192.168.178.38
10.36.6.162
fd00::523f:56ff:fe00:4994
fe80::523f:56ff:fe00:4994
OS: Fedora 20 (Heisenbug)
Unit: machine-mycontainer.scope
├─5374 /usr/lib/systemd/systemd
└─system.slice
├─dbus.service
│ └─5414 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-act...
├─systemd-journald.service
│ └─5383 /usr/lib/systemd/systemd-journald
├─systemd-logind.service
│ └─5411 /usr/lib/systemd/systemd-logind
└─console-getty.service
└─5416 /sbin/agetty --noclear -s console 115200 38400 9600
With this we see some interesting information about the container, including its control group tree (with processes), IP addresses and root directory.
The "login" subcommand gets us a new login shell in the container:
# machinectl login mycontainer
Connected to container mycontainer. Press ^] three times within 1s to exit session.
Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (pts/0)
mycontainer login:
The "reboot" subcommand reboots the container:
# machinectl reboot mycontainer
The "poweroff" subcommand powers the container off:
# machinectl poweroff mycontainer
So much about the machinectl tool. The tool knows a couple of more commands, please check the man page for details. Note again that even though we use systemd-nspawn as container manager here the concepts apply to any container manager that implements the logic described here, including libvirt-lxc for example.
machinectl is not the only tool that is useful in conjunction with containers. Many of systemd's own tools have been updated to explicitly support containers too! Let's try this (after starting the container up again first, repeating the systemd-nspawn command from above.):
# hostnamectl -M mycontainer set-hostname "wuff"
This uses hostnamectl(1) on the local container and sets its hostname.
Similar, many other tools have been updated for connecting to local
containers. Here's
systemctl(1)'s -M
switch
in action:
# systemctl -M mycontainer
UNIT LOAD ACTIVE SUB DESCRIPTION
-.mount loaded active mounted /
dev-hugepages.mount loaded active mounted Huge Pages File System
dev-mqueue.mount loaded active mounted POSIX Message Queue File System
proc-sys-kernel-random-boot_id.mount loaded active mounted /proc/sys/kernel/random/boot_id
[...]
time-sync.target loaded active active System Time Synchronized
timers.target loaded active active Timers
systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
49 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
As expected, this shows the list of active units on the specified container, not the host. (Output is shortened here, the blog story is already getting too long).
Let's use this to restart a service within our container:
# systemctl -M mycontainer restart systemd-resolved.service
systemctl has more container support though than just the -M
switch. With the -r
switch it shows the units running on the host,
plus all units of all local, running containers:
# systemctl -r
UNIT LOAD ACTIVE SUB DESCRIPTION
boot.automount loaded active waiting EFI System Partition Automount
proc-sys-fs-binfmt_misc.automount loaded active waiting Arbitrary Executable File Formats File Syst
sys-devices-pci0000:00-0000:00:02.0-drm-card0-card0\x2dLVDS\x2d1-intel_backlight.device loaded active plugged /sys/devices/pci0000:00/0000:00:02.0/drm/ca
[...]
timers.target loaded active active Timers
mandb.timer loaded active waiting Daily man-db cache update
systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories
mycontainer:-.mount loaded active mounted /
mycontainer:dev-hugepages.mount loaded active mounted Huge Pages File System
mycontainer:dev-mqueue.mount loaded active mounted POSIX Message Queue File System
[...]
mycontainer:time-sync.target loaded active active System Time Synchronized
mycontainer:timers.target loaded active active Timers
mycontainer:systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
191 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
We can see here first the units of the host, then followed by the units of the one container we have currently running. The units of the containers are prefixed with the container name, and a colon (":"). (The output is shortened again for brevity's sake.)
The list-machines
subcommand of systemctl shows a list of all
running containers, inquiring the system managers within the containers
about system state and health. More specifically it shows if
containers are properly booted up, or if there are any failed
services:
# systemctl list-machines
NAME STATE FAILED JOBS
delta (host) running 0 0
mycontainer running 0 0
miau degraded 1 0
waldi running 0 0
4 machines listed.
To make things more interesting we have started two more containers in
parallel. One of them has a failed service, which results in the
machine state to be degraded
.
Let's have a look at
journalctl(1)'s
container support. It too supports -M
to show the logs of a specific
container:
# journalctl -M mycontainer -n 8
Nov 12 16:51:13 wuff systemd[1]: Starting Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Reached target Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Starting Update UTMP about System Runlevel Changes...
Nov 12 16:51:13 wuff systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
Nov 12 16:51:13 wuff systemd[1]: Started Update UTMP about System Runlevel Changes.
Nov 12 16:51:13 wuff systemd[1]: Startup finished in 399ms.
Nov 12 16:51:13 wuff sshd[35]: Server listening on 0.0.0.0 port 24.
Nov 12 16:51:13 wuff sshd[35]: Server listening on :: port 24.
However, it also supports -m
to show the combined log stream of the
host and all local containers:
# journalctl -m -e
(Let's skip the output here completely, I figure you can extrapolate how this looks.)
But it's not only systemd's own tools that understand container support these days, procps sports support for it, too:
# ps -eo pid,machine,args
PID MACHINE COMMAND
1 - /usr/lib/systemd/systemd --switched-root --system --deserialize 20
[...]
2915 - emacs contents/projects/containers.md
3403 - [kworker/u16:7]
3415 - [kworker/u16:9]
4501 - /usr/libexec/nm-vpnc-service
4519 - /usr/sbin/vpnc --non-inter --no-detach --pid-file /var/run/NetworkManager/nm-vpnc-bfda8671-f025-4812-a66b-362eb12e7f13.pid -
4749 - /usr/libexec/dconf-service
4980 - /usr/lib/systemd/systemd-resolved
5006 - /usr/lib64/firefox/firefox
5168 - [kworker/u16:0]
5192 - [kworker/u16:4]
5193 - [kworker/u16:5]
5497 - [kworker/u16:1]
5591 - [kworker/u16:8]
5711 - sudo -s
5715 - /bin/bash
5749 - /home/lennart/projects/systemd/systemd-nspawn -D /srv/mycontainer -b
5750 mycontainer /usr/lib/systemd/systemd
5799 mycontainer /usr/lib/systemd/systemd-journald
5862 mycontainer /usr/lib/systemd/systemd-logind
5863 mycontainer /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
5868 mycontainer /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102
5871 mycontainer /usr/sbin/sshd -D
6527 mycontainer /usr/lib/systemd/systemd-resolved
[...]
This shows a process list (shortened). The second column shows the container a process belongs to. All processes shown with "-" belong to the host itself.
But it doesn't stop there. The new "sd-bus" D-Bus client library we
have been preparing in the systemd/kdbus context knows containers
too. While you use sd_bus_open_system()
to connect to your local
host's system bus sd_bus_open_system_container()
may be used to
connect to the system bus of any local container, so that you can
execute bus methods on it.
sd-login.h
and machined's bus
interface
provide a number of APIs to add container support to other programs
too. They support enumeration of containers as well as retrieving the
machine name from a PID and similar.
systemd-networkd also has support for containers. When run inside a
container it will by default run a DHCP client and IPv4LL on any veth
network interface named host0
(this interface is special under the
logic described here). When run on the host networkd will by default
provide a DHCP server and IPv4LL on veth network interface named ve-
followed by a container name.
Let's have a look at one last facet of systemd's container
integration: the hook-up with the name service switch. Recent systemd
versions contain a new NSS module nss-mymachines that make the names
of all local containers resolvable via gethostbyname()
and
getaddrinfo()
. This only applies to containers that run within their
own network namespace. With the systemd-nspawn command shown above the
the container shares the network configuration with the host however;
hence let's restart the container, this time with a virtual veth
network link between host and container:
# machinectl poweroff mycontainer
# systemd-nspawn -D /srv/mycontainer --network-veth -b
Now, (assuming that networkd is used in the container and outside) we can already ping the container using its name, due to the simple magic of nss-mymachines:
# ping mycontainer
PING mycontainer (10.0.0.2) 56(84) bytes of data.
64 bytes from mycontainer (10.0.0.2): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from mycontainer (10.0.0.2): icmp_seq=2 ttl=64 time=0.078 ms
Of course, name resolution not only works with ping
, it works with
all other tools that use libc gethostbyname()
or getaddrinfo()
too, among them venerable ssh
.
And this is pretty much all I want to cover for now. We briefly touched a variety of integration points, and there's a lot more still if you look closely. We are working on even more container integration all the time, so expect more new features in this area with every systemd release.
Note that the whole machine concept is actually not limited to containers, but covers VMs too to a certain degree. However, the integration is not as close, as access to a VM's internals is not as easy as for containers, as it usually requires a network transport instead of allowing direct syscall access.
Anyway, I hope this is useful. For further details, please have a look at the linked man pages and other documentation.