Posted on Fr 18 April 2008

Finally, Secure Real-Time on the Desktop

Finally, secure real-time scheduling on the Linux desktop can be become a reality. Linux 2.6.25 gained Real-Time Group Scheduling, a feature which allows to limit the amount of CPU time real-time processes and threads may consume.

Traditionally on Linux real-time scheduling was limited to priviliged processes, because RT processes can lock up the machine if they enter a busy loop. Scheduling is effectively disabled for them -- they can do whatever they want and are (almost) never preempted by the kernel in what they are doing. In 2.6.12 RLIMIT_RTPRIO was introduced. It's a resource limit which opened up real-time scheduling for normal user processes. However the ability to lock up the machine for RT processes was not touched by this. When using /usr/security/limits.conf to raise this limit for specific users they'd gain the ability to lock up your machine.

Due to this raising this limit is a task that is left to the administrator on all current distros. Shipping a distro with the limit raised by default is shipping a distro where local users can easily freeze their machines.

It was always possible to write "watchdog" tools that could supervise RT processes by running on a higher RT priority and checking the CPU load imposed by the process on the system. However, to this point it was not possible in any way that would actually be secure (so that processes cannot escape the watchdog by forking), that wouldn't require lots of work in the watchdog (which is a bad idea since it runs at a very high RT priority, thus while it doing its stuff it will block the important RT processes from running), or that wouldn't be totally ugly.

Real-Time Group Scheduling solves the problem. It is now possible to create a cgroup for the processes to supervise. The processes cannot escape the cgroup by forking. Then, by manipulating the cpu.rt_runtime_us property of the cgroup a certain amount of RT CPU time can be assigned to the cgroup -- processes in the group cannot spend more time than this limit per one period of time. (The period length can be controlled globally via /proc/sys/kernel/sched_rt_period_us).

To demonstrate this I wrote a tool rtwatch which implements this technique in a watchdog tool that is SUID root, creates a cgroup, and forks off a user defined process inside, it with raised RLIMIT_PTPRIO but normal user priviliges. The child process can then acquire RT scheduling but never consume more CPU than allowed by the cgroup, with no option to lock up the machine anymore.

How to use this?

$ rtwatch 5 rtcpuhogger

This will start the process rtcpuhogger and grant it 5% of the available CPU time. To make sure that this is not misused by the user rtwatch will refuse to assign more than 50% CPU time to a single child. Since RT scheduling is all about determinism it is not possible to assign more than 100% CPU time (globally in sum) to all RT processes this way. Also, rtwatch will always make sure that 5% will be left for other tasks.

To work, rtwatch needs to run on Linux 2.6.25 with CONFIG_RT_GROUP_SCHED enabled. Unfortunately the Fedora kernel is not compiled this way, yet.

Why is all this so great? Those who attended my talk Practical Real-Time Programming in Userspace at 2008 (or watched the video) will know that besides the fact that I'd love to enable RT support for PulseAudio in Fedora in coming releases by default I'd also love to see RT programming more often used in desktop applications. Everywhere were the CPU time spent on a specific process should not depend on the overall system load, but solely on the time constraints of the job itself and what is process needs RT scheduling should be enabled. Examples for this are music or movie playback (the movie player should have enough time to decode one frame every 25th of a second, regardless what else is running on the system), fancy animations, quick reactions to user actions (i.e. updating the mouse cursor). All this for a machine that is snappier and more responsive with shorter latencies, regardless what else happens on the machine.

The day before yesterday, when Linux 2.6.25 was released, we came a big step closer to this goal.

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .