TL;DR: systemd now can do per-service IP traffic accounting, as well as access control for IP address ranges.
Last Friday we released systemd 235. I already blogged about its Dynamic User feature in detail, but there's one more piece of new functionality that I think deserves special attention: IP accounting and access control.
Before v235 systemd already provided per-unit resource management hooks for a number of different kinds of resources: consumed CPU time, disk I/O, memory usage and number of tasks. With v235 another kind of resource can be controlled per-unit with systemd: network traffic (specifically IP).
Three new unit file settings have been added in this context:
IPAccounting=is a boolean setting. If enabled for a unit, all IP traffic sent and received by processes associated with it is counted both in terms of bytes and of packets.
IPAddressDeny=takes an IP address prefix (that means: an IP address with a network mask). All traffic from and to this address will be prohibited for processes of the service.
IPAddressAllow=is the matching positive counterpart to
IPAddressDeny=. All traffic matching this IP address/network mask combination will be allowed, even if otherwise listed in
The three options are thin wrappers around kernel functionality
introduced with Linux 4.11: the control group eBPF hooks. The actual
work is done by the kernel, systemd just provides a number of new
settings to configure this facet of it. Note that cgroup/eBPF is
unrelated to classic Linux firewalling,
iptables. It's up to you whether you use one or the
other, or both in combination (or of course neither).
Let's have a closer look at the IP accounting logic mentioned
above. Let's write a simple unit
[Service] ExecStart=/usr/bin/ping 188.8.131.52 IPAccounting=yes
This simple unit invokes the
ping(8) command to
send a series of ICMP/IP ping packets to the IP address 184.108.40.206 (which
is the Google DNS server IP; we use it for testing here, since it's
easy to remember, reachable everywhere and known to react to ICMP
pings; any other IP address responding to pings would be fine to use,
IPAccounting= option is used to turn on IP accounting for
Let's start this service after writing the file. Let's then have a
look at the status output of
# systemctl daemon-reload # systemctl start ip-accounting-test # systemctl status ip-accounting-test ● ip-accounting-test.service Loaded: loaded (/etc/systemd/system/ip-accounting-test.service; static; vendor preset: disabled) Active: active (running) since Mon 2017-10-09 18:05:47 CEST; 1s ago Main PID: 32152 (ping) IP: 168B in, 168B out Tasks: 1 (limit: 4915) CGroup: /system.slice/ip-accounting-test.service └─32152 /usr/bin/ping 220.127.116.11 Okt 09 18:05:47 sigma systemd: Started ip-accounting-test.service. Okt 09 18:05:47 sigma ping: PING 18.104.22.168 (22.214.171.124) 56(84) bytes of data. Okt 09 18:05:47 sigma ping: 64 bytes from 126.96.36.199: icmp_seq=1 ttl=59 time=29.2 ms Okt 09 18:05:48 sigma ping: 64 bytes from 188.8.131.52: icmp_seq=2 ttl=59 time=28.0 ms
This shows the
ping command running — it's currently at its second
ping cycle as we can see in the logs at the end of the output. More
interesting however is the
IP: line further up showing the current
IP byte counters. It currently shows 168 bytes have been received, and
168 bytes have been sent. That the two counters are at the same value
is not surprising: ICMP ping requests and responses are supposed to
have the same size. Note that this line is shown only if
IPAccounting= is turned on for the service, as only then this data
Let's wait a bit, and invoke
systemctl status again:
# systemctl status ip-accounting-test ● ip-accounting-test.service Loaded: loaded (/etc/systemd/system/ip-accounting-test.service; static; vendor preset: disabled) Active: active (running) since Mon 2017-10-09 18:05:47 CEST; 4min 28s ago Main PID: 32152 (ping) IP: 22.2K in, 22.2K out Tasks: 1 (limit: 4915) CGroup: /system.slice/ip-accounting-test.service └─32152 /usr/bin/ping 184.108.40.206 Okt 09 18:10:07 sigma ping: 64 bytes from 220.127.116.11: icmp_seq=260 ttl=59 time=27.7 ms Okt 09 18:10:08 sigma ping: 64 bytes from 18.104.22.168: icmp_seq=261 ttl=59 time=28.0 ms Okt 09 18:10:09 sigma ping: 64 bytes from 22.214.171.124: icmp_seq=262 ttl=59 time=33.8 ms Okt 09 18:10:10 sigma ping: 64 bytes from 126.96.36.199: icmp_seq=263 ttl=59 time=48.9 ms Okt 09 18:10:11 sigma ping: 64 bytes from 188.8.131.52: icmp_seq=264 ttl=59 time=27.2 ms Okt 09 18:10:12 sigma ping: 64 bytes from 184.108.40.206: icmp_seq=265 ttl=59 time=27.0 ms Okt 09 18:10:13 sigma ping: 64 bytes from 220.127.116.11: icmp_seq=266 ttl=59 time=26.8 ms Okt 09 18:10:14 sigma ping: 64 bytes from 18.104.22.168: icmp_seq=267 ttl=59 time=27.4 ms Okt 09 18:10:15 sigma ping: 64 bytes from 22.214.171.124: icmp_seq=268 ttl=59 time=29.7 ms Okt 09 18:10:16 sigma ping: 64 bytes from 126.96.36.199: icmp_seq=269 ttl=59 time=27.6 ms
As we can see, after 269 pings the counters are much higher: at 22K.
Note that while
systemctl status shows only the byte counters,
packet counters are kept as well. Use the low-level
command to query the current raw values of the in and out packet and
# systemctl show ip-accounting-test -p IPIngressBytes -p IPIngressPackets -p IPEgressBytes -p IPEgressPackets IPIngressBytes=37776 IPIngressPackets=449 IPEgressBytes=37776 IPEgressPackets=449
Of course, the same information is also available via the D-Bus
APIs. If you want to process this data further consider talking proper
D-Bus, rather than scraping the output of
Now, let's stop the service again:
# systemctl stop ip-accounting-test
When a service with such accounting turned on terminates, a log line
about all its consumed resources is written to the logs. Let's check
# journalctl -u ip-accounting-test -n 5 -- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:17:02 CEST. -- Okt 09 18:15:50 sigma ping: 64 bytes from 188.8.131.52: icmp_seq=603 ttl=59 time=26.9 ms Okt 09 18:15:51 sigma ping: 64 bytes from 184.108.40.206: icmp_seq=604 ttl=59 time=27.2 ms Okt 09 18:15:52 sigma systemd: Stopping ip-accounting-test.service... Okt 09 18:15:52 sigma systemd: Stopped ip-accounting-test.service. Okt 09 18:15:52 sigma systemd: ip-accounting-test.service: Received 49.5K IP traffic, sent 49.5K IP traffic
The last line shown is the interesting one, that shows the accounting data. It's actually a structured log message, and among its metadata fields it contains the more comprehensive raw data:
# journalctl -u ip-accounting-test -n 1 -o verbose -- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:18:50 CEST. -- Mon 2017-10-09 18:15:52.649028 CEST [s=89a2cc877fdf4dafb2269a7631afedad;i=14d7;b=4c7e7adcba0c45b69d612857270716d3;m=137592e75e;t=55b1f81298605;x=c3c9b57b28c9490e] PRIORITY=6 _BOOT_ID=4c7e7adcba0c45b69d612857270716d3 _MACHINE_ID=e87bfd866aea4ae4b761aff06c9c3cb3 _HOSTNAME=sigma SYSLOG_FACILITY=3 SYSLOG_IDENTIFIER=systemd _UID=0 _GID=0 _TRANSPORT=journal _PID=1 _COMM=systemd _EXE=/usr/lib/systemd/systemd _CAP_EFFECTIVE=3fffffffff _SYSTEMD_CGROUP=/init.scope _SYSTEMD_UNIT=init.scope _SYSTEMD_SLICE=-.slice CODE_FILE=../src/core/unit.c _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 25 _SELINUX_CONTEXT=system_u:system_r:init_t:s0 UNIT=ip-accounting-test.service CODE_LINE=2115 CODE_FUNC=unit_log_resources MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0 INVOCATION_ID=98a6e756fa9d421d8dfc82b6df06a9c3 IP_METRIC_INGRESS_BYTES=50880 IP_METRIC_INGRESS_PACKETS=605 IP_METRIC_EGRESS_BYTES=50880 IP_METRIC_EGRESS_PACKETS=605 MESSAGE=ip-accounting-test.service: Received 49.6K IP traffic, sent 49.6K IP traffic _SOURCE_REALTIME_TIMESTAMP=1507565752649028
The interesting fields of this log message are of course
IP_METRIC_EGRESS_PACKETS= that show the
The log message carries a message
that may be used to quickly search for all such resource log messages
ae8f7b866b0347b9af31fe1c80b127c0). We can combine a search term for
messages of this ID with
-u switch to quickly find
out about the resource usage of any invocation of a specific
service. Let's try:
# journalctl -u ip-accounting-test MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0 -- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:25:27 CEST. -- Okt 09 18:15:52 sigma systemd: ip-accounting-test.service: Received 49.6K IP traffic, sent 49.6K IP traffic
Of course, the output above shows only one message at the moment, since we started the service only once, but a new one will appear every time you start and stop it again.
The IP accounting logic is also hooked up with
which is useful for transiently running a command as systemd service
with IP accounting turned on. Let's try it:
# systemd-run -p IPAccounting=yes --wait wget https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf Running as unit: run-u2761.service Finished with result: success Main processes terminated with: code=exited/status=0 Service runtime: 878ms IP traffic received: 231.0K IP traffic sent: 3.7K
wget to download the
PDF version of the 2nd day
of everybody's favorite Linux user-space conference All Systems Go!
2017 (BTW, have you already booked your
ticket? We are very close to
selling out, be quick!). The IP traffic this command generated was
231K ingress and 4K egress. In the
systemd-run command line two
parameters are important. First of all, we use
to turn on IP accounting for the transient service (as above). And
secondly we use
--wait to tell
systemd-run to wait for the service
to exit. If
--wait is used,
systemd-run will also show you various
statistics about the service that just ran and terminated, including
the IP statistics you are seeing if IP accounting has been turned on.
It's fun to combine this sort of IP accounting with interactive transient units. Let's try that:
# systemd-run -p IPAccounting=1 -t /bin/sh Running as unit: run-u2779.service Press ^] three times within 1s to disconnect TTY. sh-4.4# dnf update … sh-4.4# dnf install firefox … sh-4.4# exit Finished with result: success Main processes terminated with: code=exited/status=0 Service runtime: 5.297s IP traffic received: …B IP traffic sent: …B
--pty switch (or short:
-t), which opens
an interactive pseudo-TTY connection to the invoked service process,
which is a bourne shell in this case. Doing this means we have a full,
comprehensive shell with job control and everything. Since the shell
is running as part of a service with IP accounting turned on, all IP
traffic we generate or receive will be accounted for. And as soon as
we exit the shell, we'll see what it consumed. (For the sake of
brevity I actually didn't paste the whole output above, but truncated
core parts. Try it out for yourself, if you want to see the output in
Sometimes it might make sense to turn on IP accounting for a unit that
is already running. For that, use
foobar.service IPAccounting=yes, which will instantly turn on
accounting for it. Note that it won't count retroactively though: only
the traffic sent/received after the point in time you turned it on
will be collected. You may turn off accounting for the unit with the
Of course, sometimes it's interesting to collect IP accounting data
for all services, and turning on
IPAccounting=yes in every single
unit is cumbersome. To deal with that there's a global option
available which can be set in
IP Access Lists
So much about IP accounting. Let's now have a look at IP access
control with systemd 235. As mentioned above, the two new unit file
IPAddressDeny= maybe be used for
that. They operate in the following way:
If the source address of an incoming packet or the destination address of an outgoing packet matches one of the IP addresses/network masks in the relevant unit's
IPAddressAllow=setting then it will be allowed to go through.
Otherwise, if a packet matches an
IPAddressDeny=entry configured for the service it is dropped.
If the packet matches neither of the above it is allowed to go through.
Or in other words,
IPAddressDeny= implements a blacklist, but
IPAddressAllow= takes precedence.
Let's try that out. Let's modify our last example above in order to get a transient service running an interactive shell which has such an access list set:
# systemd-run -p IPAddressDeny=any -p IPAddressAllow=220.127.116.11 -p IPAddressAllow=127.0.0.0/8 -t /bin/sh Running as unit: run-u2850.service Press ^] three times within 1s to disconnect TTY. sh-4.4# ping 18.104.22.168 -c1 PING 22.214.171.124 (126.96.36.199) 56(84) bytes of data. 64 bytes from 188.8.131.52: icmp_seq=1 ttl=59 time=27.9 ms --- 184.108.40.206 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 27.957/27.957/27.957/0.000 ms sh-4.4# ping 220.127.116.11 -c1 PING 18.104.22.168 (22.214.171.124) 56(84) bytes of data. ping: sendmsg: Operation not permitted ^C --- 126.96.36.199 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms sh-4.4# ping 127.0.0.2 -c1 PING 127.0.0.1 (127.0.0.2) 56(84) bytes of data. 64 bytes from 127.0.0.2: icmp_seq=1 ttl=64 time=0.116 ms --- 127.0.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.116/0.116/0.116/0.000 ms sh-4.4# exit
The access list we set up uses
IPAddressDeny=any in order to define
an IP white-list: all traffic will be prohibited for the session,
except for what is explicitly white-listed. In this command line, we
white-listed two address prefixes: 188.8.131.52 (with no explicit network
mask, which means the mask with all bits turned on is implied,
/32), and 127.0.0.0/8. Thus, the service can communicate with
Google's DNS server and everything on the local loop-back, but nothing
else. The commands run in this interactive shell show this: First we
try pinging 184.108.40.206 which happily responds. Then, we try to ping
220.127.116.11 (that's Google's other DNS server, but excluded from this
white-list), and as we see it is immediately refused with an Operation
not permitted error. As last step we ping 127.0.0.2 (which is on the
local loop-back), and we see it works fine again, as expected.
In the example above we used
identifier is a shortcut for writing 0.0.0.0/0 ::/0, i.e. it's a
shortcut for everything, on both IPv4 and IPv6. A number of other
such shortcuts exist. For example, instead of spelling out
127.0.0.0/8 we could also have used the more descriptive shortcut
localhost which is expanded to 127.0.0.0/8 ::1/128, i.e. everything
on the local loopback device, on both IPv4 and IPv6.
Being able to configure IP access lists individually for each unit is
pretty nice already. However, typically one wants to configure this
comprehensively, not just for individual units, but for a set of units
in one go or even the system as a whole. In systemd, that's possible
by making use of
units (for those who don't know systemd that well, slice units are a
concept for organizing services in hierarchical tree for the purpose of
resource management): the IP access list in effect for a unit is the
combination of the individual IP access lists configured for the unit
itself and those of all slice units it is contained in.
By default, system services are assigned to
which in turn is a child of the root slice
of these two slice units are hence suitable for locking down all
system services at once. If an access list is configured on
system.slice it will only apply to system services, however, if
-.slice it will apply to all user processes of the
system, including all user session processes (i.e. which are by
default assigned to
user.slice which is a child of
addition to the system services.
Let's make use of this:
# systemctl set-property system.slice IPAddressDeny=any IPAddressAllow=localhost # systemctl set-property apache.service IPAddressAllow=10.0.0.0/8
The two commands above are a very powerful way to first turn off all IP communication for all system services (with the exception of loop-back traffic), followed by an explicit white-listing of 10.0.0.0/8 (which could refer to the local company network, you get the idea) but only for the Apache service.
After playing around a bit with this, let's talk about use-cases. Here are a few ideas:
The IP access list logic can in many ways provide a more modern replacement for the venerable TCP Wrapper, but unlike it it applies to all IP sockets of a service unconditionally, and requires no explicit support in any way in the service's code: no patching required. On the other hand, TCP wrappers have a number of features this scheme cannot cover, most importantly systemd's IP access lists operate solely on the level of IP addresses and network masks, there is no way to configure access by DNS name (though quite frankly, that is a very dubious feature anyway, as doing networking — unsecured networking even – in order to restrict networking sounds quite questionable, at least to me).
It can also replace (or augment) some facets of IP firewalling, i.e. Linux NetFilter/
iptables. Right now, systemd's access lists are of course a lot more minimal than NetFilter, but they have one major benefit: they understand the service concept, and thus are a lot more context-aware than NetFilter. Classic firewalls, such as NetFilter, derive most service context from the IP port number alone, but we live in a world where IP port numbers are a lot more dynamic than they used to be. As one example, a BitTorrent client or server may use any IP port it likes for its file transfer, and writing IP firewalling rules matching that precisely is hence hard. With the systemd IP access list implementing this is easy: just set the list for your BitTorrent service unit, and all is good.
Let me stress though that you should be careful when comparing NetFilter with systemd's IP address list logic, it's really like comparing apples and oranges: to start with, the IP address list logic has a clearly local focus, it only knows what a local service is and manages access of it. NetFilter on the other hand may run on border gateways, at a point where the traffic flowing through is pure IP, carrying no information about a systemd unit concept or anything like that.
It's a simple way to lock down distribution/vendor supplied system services by default. For example, if you ship a service that you know never needs to access the network, then simply set
IPAddressDeny=any(possibly combined with
IPAddressAllow=localhost) for it, and it will live in a very tight networking sand-box it cannot escape from. systemd itself makes use of this for a number of its services by default now. For example, the logging service
systemd-journald.service, the login manager
systemd-logindor the core-dump processing unit
systemd-coredump@.serviceall have such a rule set out-of-the-box, because we know that neither of these services should be able to access the network, under any circumstances.
Because the IP access list logic can be combined with transient units, it can be used to quickly and effectively sandbox arbitrary commands, and even include them in shell pipelines and such. For example, let's say we don't trust our
curlimplementation (maybe it got modified locally by a hacker, and phones home?), but want to use it anyway to download the the slides of my most recent casync talk in order to print it, but want to make sure it doesn't connect anywhere except where we tell it to (and to make this even more fun, let's minimize privileges further, by setting
# systemd-resolve 0pointer.de 0pointer.de: 18.104.22.168 2a01:238:43ed:c300:10c3:bcf3:3266:da74 -- Information acquired via protocol DNS in 2.8ms. -- Data is authenticated: no # systemd-run --pipe -p IPAddressDeny=any \ -p IPAddressAllow=22.214.171.124 \ -p IPAddressAllow=2a01:238:43ed:c300:10c3:bcf3:3266:da74 \ -p DynamicUser=yes \ curl http://0pointer.de/public/casync-kinvolk2017.pdf | lp
So much about use-cases. This is by no means a comprehensive list of what you can do with it, after all both IP accounting and IP access lists are very generic concepts. But I do hope the above inspires your fantasy.
What does that mean for packagers?
IP accounting and IP access control are primarily concepts for the
local administrator. However, As suggested above, it's a very good
idea to ship services that by design have no network-facing
functionality with an access list of
IPAddressDeny=any (and possibly
IPAddressAllow=localhost), in order to improve the out-of-the-box
security of our systems.
An option for security-minded distributions might be a more radical
approach: ship the system with
IPAddressDeny=any by default, and ask the administrator to punch
holes into that for each network facing service with
set-property … IPAddressAllow=…. But of course, that's only an
option for distributions willing to break compatibility with what was
A couple of additional notes:
IP accounting and access lists may be mixed with socket activation. In this case, it's a good idea to configure access lists and accounting for both the socket unit that activates and the service unit that is activated, as both units maintain fully separate settings. Note that IP accounting and access lists configured on the socket unit applies to all sockets created on behalf of that unit, and even if these sockets are passed on to the activated services, they will still remain in effect and belong to the socket unit. This also means that IP traffic done on such sockets will be accounted to the socket unit, not the service unit. The fact that IP access lists are maintained separately for the kernel sockets created on behalf of the socket unit and for the kernel sockets created by the service code itself enables some interesting uses. For example, it's possible to set a relatively open access list on the socket unit, but a very restrictive access list on the service unit, thus making the sockets configured through the socket unit the only way in and out of the service.
systemd's IP accounting and access lists apply to IP sockets only, not to sockets of any other address families. That also means that
AF_PACKET(i.e. raw) sockets are not covered. This means it's a good idea to combine IP access lists with
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6in order to lock this down.
You may wonder if the per-unit resource log message and
systemd-run --waitmay also show you details about other types or resources consumed by a service. The answer is yes: if you turn on
CPUAccounting=for a service, you'll also see a summary of consumed CPU time in the log message and the command output. And we are planning to hook-up
IOAccounting=the same way too, soon.
Note that IP accounting and access lists aren't entirely free. systemd inserts an eBPF program into the IP pipeline to make this functionality work. However, eBPF execution has been optimized for speed in the last kernel versions already, and given that it currently is in the focus of interest to many I'd expect to be optimized even further, so that the cost for enabling these features will be negligible, if it isn't already.
IP accounting is currently not recursive. That means you cannot use a slice unit to join the accounting of multiple units into one. This is something we definitely want to add, but requires some more kernel work first.
You might wonder how the
PrivateNetwork=setting relates to
IPAccessDeny=any. Superficially they have similar effects: they make the network unavailable to services. However, looking more closely there are a number of differences.
PrivateNetwork=is implemented using Linux network name-spaces. As such it entirely detaches all networking of a service from the host, including non-IP networking. It does so by creating a private little environment the service lives in where communication with itself is still allowed though. In addition using the
JoinsNamespaceOf=dependency additional services may be added to the same environment, thus permitting communication with each other but not with anything outside of this group.
IPAddressDeny=are much less invasive. First of all they apply to IP networking only, and can match against specific IP addresses. A service running with
PrivateNetwork=turned off but
IPAddressDeny=anyturned on, may enumerate the network interfaces and their IP configured even though it cannot actually do any IP communication. On the other hand if you turn on
PrivateNetwork=all network interfaces besides
lodisappear. Long story short: depending on your use-case one, the other, both or neither might be suitable for sand-boxing of your service. If possible I'd always turn on both, for best security, and that's what we do for all of systemd's own long-running services.
And that's all for now. Have fun with per-unit IP accounting and access lists!