Category: projects

FOMS 2009 CFP

And here's a another conference CFP, this time for Foundations of Open Media Software 2009 (FOMS). It's simply the best conference about multimedia on free systems. Period.

It's the third iteration now, and the first two were plain awesome, so don't miss this one. It happens in Hobart, Tasmania, next to linux.conf.au 2009.

Send in your paper! Attend! Spread the word!

The Call for Papers for the Linux Plumbers Conference in September in Portland is out now. It's a conference about the core infrastructure of Linux systems: the part of the system where userspace and the kernel interface. It's the first conference where the focus is specifically on getting together the kernel people who work on the userspace interfaces and the userspace people who have to deal with kernel interfaces. It's supposed to be a place where all the people doing infrastructure work sit down and talk, so that each other understands better what the requirements and needs of the other are, and where we can work towards fixing the major problems we currently have with our lower-level APIs.

I am running the Audio microconf of the Plumbers Conference. Audio infrastructure on Linux is still heavily fragmented. Pro, desktop and embedded worlds are almost completely seperate worlds. While we have quite good driver support the user experience is far from perfect, mostly due because our infrastructure is so balkanized. Join us at the Plumbers Conference and help to fix this! If you are doing audio infrastructure work on Linux, make sure to attend or -- even better -- submit a paper!

See you in Portland!

How to convert a GIT SVN mirror into GIT upstream

Yesterday I did the final steps to convert all my SVN repositories to GIT (including Avahi and PulseAudio). I had been running hot GIT mirrors of the SVN repositories for quite a while now. The last step was the switch to make them canonical upstream, and to disable the SVN repos.

For future Google reference, here are the steps that are necessary to make an SVN GIT mirror into a proper GIT repo:

# On the client:
$ git clone ssh://..../git/foobar foobar
$ cd foobar
$ git checkout trunk
$ git branch -m master
$ git push origin master
# This is a good time to edit the HEAD file on the server and replace its contents "ref: refs/heads/trunk" by "ref: refs/heads/master"
$ git push origin :trunk

This will basically replace 'trunk' by 'master', and make it the default when clients clone the repository. This will however not rename tags from the git-svn style to the GIT style. (Which I personally think would be a bad idea anyway, BTW)

Removing the origin from the server's config file is a good idea, too, since the repo is now canonical upstream.

Of course, afterwards you still need to create proper .gitignore files for the repositories. Just taking the value of the old svn:ignore property is a bad idea BTW, because .gitignore lists patterns that are used for the directory it is placed in and everything beneath, while svn:ignore is not applied recursively.

And finally you need to remove all those $Id$ lines and suchlike from all source files since they are kind of pointless on GIT. It is left as an excercise to the user to craft a good sed or perl script to do this automatically and recursively for an entire tree.

Lazyweb, do you have a good idea how to integrate mutt and git-am best? I want a key in mutt I can press which will ask me for a GIT directory and then call git-am --interactive for the currently selected email. Anyone got a good idea? Right now I am piping the mail from mutt to git-am. But that sucks, because --interactive refuses to work called like that and because I cannot specify the git repo to apply this to.

K lovers and event sounds

OK, before more people complain that I didn't keep the KDE in the loop about all that fancy event sound infrastructure work. The complaint is only partially valid: stuff like the sound specs have been seen before by the KDE guys. And for the rest it's just better to have something concrete to discuss about first instead of just starting an unfocussed discussion about all the grand plans we might have without ever having looked into actually implementing them.

Shortly after I posted that last blog story of mine I informed the KDE guys about this, and asked for their comments and suggestions. And this is my summary of those dicussions.

A Sixfold Announcement

Let's have a small poll here: what is the most annoying feature of a modern GNOME desktop? You got three options to choose from:

Event sounds, if they are enabled
Event sounds, if they are enabled
Event sounds, if they are enabled

Difficult choice, right?

In my pursuit to make this choice a little bit less difficult, I'd like to draw your attention to the following six announcements:

Announcement Number One: The XDG Sound Theming Specification

Following closely the mechanisms of the XDG Icon Theme Specification I may now announce you the XDG Sound Theme Specification which will hopefully be established as the future standard for better event sound theming for free desktops. This project was started by Patryk Zawadzki and is now maintained by Marc-André Lureau.

Announcement Number Two: The XDG Sound Naming Specification

If we have a Sound Theming Specification, then we also need an XDG Sound Naming Specification, again drawing heavily from the original XDG Icon Naming Specification. It's based on some older Bango work (which seems to be a defunct project these days), and is also maintained by Monsieur Lureau. The list of defined sounds is hopefully much more complete than any previous work in this area for free desktops.

Announcement Number Three: The freedesktop Sound Theme

Of course, what would the mentioned two standards be worth if there wasn't a single implementation of them? So here I may now announce you the first (rubbish) version of the XDG freedesktop Sound Theme.. It's basically just a tarball with a number of symlinks linking to the old gnome-audio event sounds. It's only a very small subset of the entire list of XDG sound names. My hope is that this initial release will spark community contributions for a better, higher quality default sound theme for free desktops. If you are some kind of musician or audio technician I am happy to take your submissions!

Announcement Number Four: The libcanberra Event Sound API

Ok, we now have those two specs, and an example theme, what else is missing to make this stuff a success? Absolutely right, an actual implementation of the sound theming logic! And this is what libcanberra is. It is a very small and lean implementation of the specification. However, it is also very powerful, and can be used in a much more elaborate way than previous APIs. It's all about the central function called ca_context_play() which takes a NULL terminated list of string properties for the sound you want to generate. How this looks like?

{
	ca_context *c = NULL;

	/* Create a context for the event sounds for your application */
	ca_context_create(&c);

	/* Set a few application-global properties */
	ca_context_change_props(c,
	                        CA_PROP_APPLICATION_NAME, "An example",
				CA_PROP_APPLICATION_ID, "org.freedesktop.libcanberra.Test",
				CA_PROP_APPLICATION_ICON_NAME, "libcanberra-test",
			        NULL);

	/* ... */

	/* Trigger an event sound */
	ca_context_play(c, 0,
			CA_PROP_EVENT_ID, "button-pressed", /* The XDG sound name */
			CA_PROP_MEDIA_NAME, "The user pressed the button foobar",
			CA_PROP_EVENT_MOUSE_X, "555",
			CA_PROP_EVENT_MOUSE_Y, "666",
			CA_PROP_WINDOW_NAME, "Foobar Dialog",
			CA_PROP_WINDOW_ICON_NAME, "libcanberra-test-foobar-dialog",
			CA_PROP_WINDOW_X11_DISPLAY, ":0",
			CA_PROP_WINDOW_X11_XID, "4711",
			NULL);

	/* ... */

	ca_context_destroy(&c);
}

So, the idea is pretty simple, it's all built around those sound event properties. A few you initialize globally for your application, and some you pass each time you actually want to trigger a sound. The properties listed above are only a subset of the default ones that are defined. They can be extended at any time. Why is it good to attach all this information to those event sounds? First, for a11y reasons, where visual feedback in addition of audible feedback might be advisable. And then, if the underlying sound system knows which window triggered the event it can take per-window volumes or other settings into account. If we know that the sound event was triggered by a mouse event, then the sound system could position the sound in space: i.e. if you click a button on the left side of the screen, the event sound will come more out of your left speaker, and if you click on the right, it will be positioned nearer to the right speaker. The more information the underlying audio system has about the event sound the fancier 'earcandy' it can do to enhance your user experience with all kinds of audio effects.

The library is thread-safe, brings no dependencies besides OGG Vorbis (and of course a Libc), and whatever the used backend requires. The library can support multiple different backends. Either you can compile a single one directly into the libcanberra.so library, or you can bind them at runtime via shared objects. Right now, libcanberra supports ALSA, PulseAudio and a null backend. The library is designed to be portable, however only supports Linux right now. The idea is to translate the XDG sound names into the sounds that are native the local platform (i.e. to whatever API Windows or MacOS use natively for sound events).

Besides all that fancy property stuff it also can do implicit on-demand cacheing of samples in the sound server, cancel currently playing sounds, notify an application when a sound finished to play and other features.

My hope is that this piece of core desktop technology can be shared by both GNOME and the KDE world.

Check out the (complete!) documentation!

Download libcanberra 0.1 now!

Read the README now!

Announcement Number Five: The libcanberra-gtk Sound Event Binding for Gtk+

If you compile libcanberra with Gtk+ support (optional), than you'll get an additional library libcanberra-gtk which provides a couple of functions to simplify event sound generation from Gtk+ programs. It will maintain a global libcanberra context, and provides a few functions that will automatically fill in quite a few properties for you, so that you don't have to fill them in manually. How does that look like? Deadly simple:

{
	/* Trigger an event sound from a GtkWidget, will automaticall fill in CA_PROP_WINDOW_xxx */
	ca_gtk_play_for_widget(GTK_WIDGET(w), 0,
                               CA_PROP_EVENT_ID, "foobar-event",
			       CA_PROP_EVENT_DESCRIPTION, "foobar event happened",
			       NULL);

	/* Alternatively, triggger an event sound from a GdkEvent, will also fill in CA_PROP_EVENT_MOUSE_xxx  */
	ca_gtk_play_for_event(gtk_get_current_event(), 0
                              CA_PROP_EVENT_ID, "waldo-event",
			      CA_PROP_EVENT_DESCRIPTION, "waldo event happened",
			      NULL);
}

Simple? Yes, deadly simple.

Check out the (complete!) documentation!

Announcement Number Five: the libcanberra-gtk-module Gtk+ Module

Okey, the example code for libcanberra-gtk is already very simple. Can we do it even shorter? Yes!

If you compile libcanberra with Gtk+ support, then you will also get a ne GtkModule which will automatically hook into all kinds of events inside a Gtk+ program and generate sound events from them. You can have sounds when you press a button, when you popup a menu or window, or when you select an item from a list box. It's all done automatically, no further change in the program is necessary. It works very similar to the old sound event code in libgnomeui, but is far less ugly, much more complete, and most importantly, works for all Gtk+ programs, not just those which link against libgnomeui. To activate this feature $GTK_MODULES=libcanberra-gtk-module must be set. So, just for completeness sake, here's how the example code for using this feature in your program looks like:

{
}

Yes, indeed. No code changes necessary. You get all those fancy UI sounds for free. Awesome? Awesome!

Of course, if you use custom widgets, or need more than just the simplest audio feedback for input you should link against libcanberra-gtk yourself, and add ca_gtk_play_for_widget() and ca_gtk_play_for_event() calls to your code, at the right places.

Announcement Number Six: My GUADEC talk

You want to know more about all this fancy new sound event world order? Then make sure to attend my talk at GUADEC 2008 in Istanbul!

Ok, that't enough announcements for now. If you want to discuss or contribute to the two specs, then please join the XDG mailing list. If you want to contribute to libcanberra, you are invited to join the libcanberra mailing list.

Of course these six announcements won't add a happy end to the GNOME sound event story just like that. We still need better sounds, and better integration into applications. But just think of how high quality the sound events on e.g. MacOS X are, and you can see (or hear) what I hope to get for the free desktops as well. Also my hope is that since we now have a decent localization infrastructure for our sounds in place, we can make speech sound events more popular, and thus sound events much more useful. i.e. have a nice girl's voice telling you "You disc finished burning!" instead of some annoying nobody-knows-what-it-means bing sound. I am one of those who usually have there event sounds disabled all the time. My hope is that in a few months time I won't have any reason more to do so.

GSoC 2008

I am happy that two GSoC projects got accepted that are related to projects I maintain:

LLMNR Protocol Integration in Avahi, by Sunil Kumar Ghai, mentored by Trent Lloyd. The GNOME project generously allowed this application to happen under its umbrella. LLMNR support is a big improvement for Avahi. We will then integrate into newer Windows networks as seamless as we already integrate into MacOS X networks.
Integration of the Bluetooth Audio service with PulseAudio, by João Paulo Rechi Vita, mentored by Luiz Augusto von Dentz. Made possible through the BlueZ project.

I'd like to thank the GNOME and BlueZ projects for making these GSoC applications a reality.

Finally, Secure Real-Time on the Desktop

Finally, secure real-time scheduling on the Linux desktop can be become a reality. Linux 2.6.25 gained Real-Time Group Scheduling, a feature which allows to limit the amount of CPU time real-time processes and threads may consume.

Traditionally on Linux real-time scheduling was limited to priviliged processes, because RT processes can lock up the machine if they enter a busy loop. Scheduling is effectively disabled for them -- they can do whatever they want and are (almost) never preempted by the kernel in what they are doing. In 2.6.12 RLIMIT_RTPRIO was introduced. It's a resource limit which opened up real-time scheduling for normal user processes. However the ability to lock up the machine for RT processes was not touched by this. When using /usr/security/limits.conf to raise this limit for specific users they'd gain the ability to lock up your machine.

Due to this raising this limit is a task that is left to the administrator on all current distros. Shipping a distro with the limit raised by default is shipping a distro where local users can easily freeze their machines.

It was always possible to write "watchdog" tools that could supervise RT processes by running on a higher RT priority and checking the CPU load imposed by the process on the system. However, to this point it was not possible in any way that would actually be secure (so that processes cannot escape the watchdog by forking), that wouldn't require lots of work in the watchdog (which is a bad idea since it runs at a very high RT priority, thus while it doing its stuff it will block the important RT processes from running), or that wouldn't be totally ugly.

Real-Time Group Scheduling solves the problem. It is now possible to create a cgroup for the processes to supervise. The processes cannot escape the cgroup by forking. Then, by manipulating the cpu.rt_runtime_us property of the cgroup a certain amount of RT CPU time can be assigned to the cgroup -- processes in the group cannot spend more time than this limit per one period of time. (The period length can be controlled globally via /proc/sys/kernel/sched_rt_period_us).

To demonstrate this I wrote a tool rtwatch which implements this technique in a watchdog tool that is SUID root, creates a cgroup, and forks off a user defined process inside, it with raised RLIMIT_PTPRIO but normal user priviliges. The child process can then acquire RT scheduling but never consume more CPU than allowed by the cgroup, with no option to lock up the machine anymore.

How to use this?

$ rtwatch 5 rtcpuhogger

This will start the process rtcpuhogger and grant it 5% of the available CPU time. To make sure that this is not misused by the user rtwatch will refuse to assign more than 50% CPU time to a single child. Since RT scheduling is all about determinism it is not possible to assign more than 100% CPU time (globally in sum) to all RT processes this way. Also, rtwatch will always make sure that 5% will be left for other tasks.

To work, rtwatch needs to run on Linux 2.6.25 with CONFIG_RT_GROUP_SCHED enabled. Unfortunately the Fedora kernel is not compiled this way, yet.

Why is all this so great? Those who attended my talk Practical Real-Time Programming in Userspace at Linux.conf.au 2008 (or watched the video) will know that besides the fact that I'd love to enable RT support for PulseAudio in Fedora in coming releases by default I'd also love to see RT programming more often used in desktop applications. Everywhere were the CPU time spent on a specific process should not depend on the overall system load, but solely on the time constraints of the job itself and what is process needs RT scheduling should be enabled. Examples for this are music or movie playback (the movie player should have enough time to decode one frame every 25th of a second, regardless what else is running on the system), fancy animations, quick reactions to user actions (i.e. updating the mouse cursor). All this for a machine that is snappier and more responsive with shorter latencies, regardless what else happens on the machine.

The day before yesterday, when Linux 2.6.25 was released, we came a big step closer to this goal.

Respect $LC_MESSAGES!

<rant>

I really dislike if software ignores my setting of $LC_MESSAGES=C and shows me its UI in German, just because I set $LANG=de_DE. I hate that. I don't want no UI strings in German, the translations are mediocre. I want everything else in German (paper sizes, ...), but no strings please. That's why I configured my locale settings this way. I don't want those settings ignored.

Please, developers, read through locale(7) and related man pages before you hack up i18n support. Thank you.

The offenders that pissed me off right now are Firefox and Fedora's man.

</rant>

What's Cooking in PulseAudio's glitch-free Branch

A while ago I started development of special branch of PulseAudio which is called glitch-free. In a few days I will merge it back to PulseAudio trunk, and eventually release it as 0.9.11. I think it's time to explain a little what all this "glitch-freeness" is about, what made it so tricky to implement, and why this is totally awesome technology. So, here we go:

Traditional Playback Model

Traditionally on most operating systems audio is scheduled via sound card interrupts (IRQs). When an application opens a sound card for playback it configures it for a fixed size playback buffer. Then it fills this buffer with digital PCM sample data. And after that it tells the hardware to start playback. Then, the hardware reads the samples from the buffer, one at a time, and passes it on to the DAC so that eventually it reaches the speakers.

After a certain number of samples played the sound hardware generates an interrupt. This interrupt is forwarded to the application. On Linux/Unix this is done via poll()/select(), which the application uses to sleep on the sound card file descriptor. When the application is notified via this interrupt it overwrites the samples that were just played by the hardware with new data and goes to sleep again. When the next interrupt arrives the next block of samples is overwritten, and so on and so on. When the hardware reaches the end of the hardware buffer it starts from its beginning again, in a true ring buffer fashion. This goes on and on and on.

The number of samples after which an interrupt is generated is usually called a fragment (ALSA likes to call the same thing a period for some reason). The number of fragments the entire playback buffer is split into is usually integral and usually a power of two, 2 and 4 being the most frequently used values.

Image 1: Schematic overview of the playback buffer in the traditional playback model, in the best way the author can visualize this with his limited drawing abilities.

If the application is not quick enough to fill up the hardware buffer again after an interrupt we get a buffer underrun ("drop-out"). An underrun is clearly hearable by the user as a discontinuity in audio which is something we clearly don't want. We thus have to carefully make sure that the buffer and fragment sizes are chosen in a way that the software has enough time to calculate the data that needs to be played, and the OS has enough time to forward the interrupt from the hardware to the userspace software and the write request back to the hardware.

Depending on the requirements of the application the size of the playback buffer is chosen. It can be as small as 4ms for low-latency applications (such as music synthesizers), or as long as 2s for applications where latency doesn't matter (such as music players). The hardware buffer size directly translates to the latency that the playback adds to the system. The smaller the fragment sizes the application configures, the more time the application has to fill up the playback buffer again.

Let's formalize this a bit: Let BUF_SIZE be the size of the hardware playback buffer in samples, FRAG_SIZE the size of one fragment in samples, and NFRAGS the number of fragments the buffer is split into (equivalent to BUF_SIZE divided by FRAG_SIZE), RATE the sampling rate in samples per second. Then, the overall latency is identical to BUF_SIZE/RATE. An interrupt is generated every FRAG_SIZE/RATE. Every time one of those interrupts is generated the application should fill up one fragment again, if it missed one interrupt this might become more than one. If it doesn't miss any interrupt it has (NFRAGS-1)*FRAG_SIZE/RATE time to fulfill the request. If it needs more time than this we'll get an underrun. The fill level of the playback buffer should thus usually oscillate between BUF_SIZE and (NFRAGS-1)*FRAG_SIZE. In case of missed interrupts it might however fall considerably lower, in the worst case to 0 which is, again, an underrun.

It is difficult to choose the buffer and fragment sizes in an optimal way for an application:

The buffer size should be as large as possible to minimize the risk of drop-outs.
The buffer size should be as small as possible to guarantee minimal latencies.
The fragment size should be as large as possible to minimize the number of interrupts, and thus the required CPU time used, to maximize the time the CPU can sleep for between interrupts and thus the battery lifetime (i.e. the fewer interrupts are generated the lower your audio app will show up in powertop, and that's what all is about, right?)
The fragment size should be as small as possible to give the application as much time as possible to fill up the playback buffer, to minimize drop-outs.

As you can easily see it is impossible to choose buffering metrics in a way that they are optimal on all four requirements.

This traditional model has major drawbacks:

The buffering metrics are highly dependant on what the sound hardware can provide. Portable software needs to be able to deal with hardware that can only provide a very limited set of buffer and fragment sizes.
The buffer metrics are configured only once, when the device is opened, they usually cannot be reconfigured during playback without major discontinuities in audio. This is problematic if more than one application wants to output audio at the same time via a sound server (or dmix) and they have different requirements on latency. For these sound servers/dmix the fragment metrics are configured statically in a configuration file, and are the same during the whole lifetime. If a client connects that needs lower latencies, it basically lost. If a client connects that doesn't need as low latencies, we will continouisly burn more CPU/battery than necessary.
It is practically impossible to choose the buffer metrics optimal for your application -- there are too many variables in the equation: you can't know anything about the IRQ/scheduling latencies of the OS/machine your software will be running on; you cannot know how much time it will actually take to produce the audio data that shall be pushed to the audio device (unless you start counting cycles, which is a good way to make your code unportable); the scheduling latencies are hugely dependant on the system load on most current OSes (unless you have an RT system, which we generally do not have). As said, for sound servers/dmix it is impossible to know in advance what the requirements on latency are that the applications that might eventually connect will have.
Since the number of fragments is integral and at least 2 on almost all existing hardware we will generate at least two interrupts on each buffer iteration. If we fix the buffer size to 2s then we will generate an interrupt at least every 1s. We'd then have 1s to fill up the buffer again -- on all modern systems this is far more than we'd ever need. It would be much better if we could fix the fragment size to 1.9s, which still gives us 100ms to fill up the playback buffer again, still more than necessary on most systems.

Due to the limitations of this model most current (Linux/Unix) software uses buffer metrics that turned out to "work most of the time", very often they are chosen without much thinking, by copying other people's code, or totally at random.

PulseAudio <= 0.9.10 uses a fragment size of 25ms by default, with four fragments. That means that right now, unless you reconfigure your PulseAudio manually clients will not get latencies lower than 100ms whatever you try, and as long as music is playing you will get 40 interrupts/s. (The relevant configuration options for PulseAudio are default-fragments= and default-fragment-size-msec= in daemon.conf)

dmix uses 16 fragments by default with a size of 21 ms each (on my system at least -- this varies, depending on your hardware). You can't get less than 47 interrupts/s. (You can change the parameters in .asoundrc)

So much about the traditional model and its limitations. Now, we'll have a peek on how the new glitch-free branch of PulseAudio does its things. The technology is not really new. It's inspired by what Vista does these days and what Apple CoreAudio has already been doing for quite a while. However, on Linux this technology is new, we have been lagging behind quite a bit. Also I claim that what PA does now goes beyond what Vista/MacOS does in many ways, though of course, they provide much more than we provide in many other ways. The name glitch-free is inspired by the term Microsoft uses to call this model, however I must admit that I am not sure that my definition of this term and theirs actually is the same.

Glitch-Free Playback Model

The first basic idea of the glitch-free playback model (a better, less marketingy name is probably timer-based audio scheduling which is the term I internally use in the PA codebase) is to no longer depend on sound card interrupts to schedule audio but use system timers instead. System timers are far more flexible then the fragment-based sound card timers. They can be reconfigured at any time, and have a granularity that is independant from any buffer metrics of the sound card. The second basic idea is to use playback buffers that are as large as possible, up to a limit of 2s or 5s. The third basic idea is to allow rewriting of the hardware buffer at any time. This allows instant reaction on user-input (i.e. pause/seek requests in your music player, or instant event sounds) although the huge latency imposed by the hardware playback buffer would suggest otherwise.

PA configures the audio hardware to the largest playback buffer size possible, up to 2s. The sound card interrupts are disabled as far as possible (most of the time this means to simply lower NFRAGS to the minimal value supported by the hardware. It would be great if ALSA would allow us to disable sound card interrupts entirely). Then, PA constantly determines what the minimal latency requirement of all connected clients is. If no client specified any requirements we fill up the whole buffer all the time, i.e. have an actual latency of 2s. However, if some applications specified requirements, we take the lowest one and only use as much of the configured hardware buffer as this value allows us. In practice, this means we only partially fill the buffer each time we wake up. Then, we configure a system timer to wake us up 10ms before the buffer would run empty and fill it up again then. If the overall latency is configured to less than 10ms we wakeup after half the latency requested.

If the sleep time turns out to be too long (i.e. it took more than 10ms to fill up the hardware buffer) we will get an underrun. If this happens we can double the time we wake up before the buffer would run empty, to 20ms, and so on. If we notice that we only used much less than the time we estimated, we can halve this value again. This adaptive scheme makes sure that in the unlikely event of a buffer underrun it will happen most likely only once and never again.

When a new client connects or an existing client disconnects, or when a client wants to rewrite what it already wrote, or the user wants to change the volume of one of the streams, then PA will resample its data passed by the client, convert it to the proper hardware sample type, and remix it with the data of the other clients. This of course makes it necessary to keep a "history" of data of all clients around so that if one client requests a rewrite we have the necessary data around to remix what already was mixed before.

The benefits of this model are manyfold:

We minimize the overall number of interrupts, down to what the latency requirements of the connected clients allow us. i.e. we save power, don't show up in powertop anymore for normal music playback.
We maximize drop-out safety, because we buffer up to 2s in the usual cases. Only with operating systems which have scheduling latencies > 2s we can still get drop-outs. Thankfully no operating system is that bad.
In the event of an underrun we don't get stuck in it, but instead are able to recover quickly and can make sure it doesn't happen again.
We provide "zero-latency". Each client can rewrite its playback buffer at any time, and this is forwarded to the hardware, even if this means that the sample currently being played needs to be rewritten. This means much quicker reaction to user input, a more responsive user experience.
We become much less dependant on what the sound hardware provides us with. We can configure wakeup times that are independant from the fragment settings that the hardware actually supports.
We can provide almost any latency a client might request, dynamically without reconfiguration, without discontinuities in audio.

Of course, this scheme also comes with major complications:

System timers and sound card timers deviate. On many sound cards by quite a bit. Also, not all sound cards allow the user to query the playback frame index at any time, but only shortly after each IRQ. To compensate for this deviation PA contains a non-trivial algorithm which tries to estimate and follow the deviation over time. If this doesn't work properly it might happen that an underrun happens much earlier than we expected.
System timers on Unix are not very high precision. On traditional Linux with HZ=100 sleep times for timers are rounded up to multiples of 10ms. Only very recent Linux kernels with hrtimers can provide something better, but only on x86 and x86-64 until now. This makes the whole scheme unusable for low latency setups unless you run the very latest Linux. Also, hrtimers are not (yet) exposed in poll()/select(). It requires major jumping through loops to work around this limitation.
We need to keep a history of sample data for each stream around, thus increasing the memory footprint and potentially increased cache pressure. PA tries to work against the increased memory footprint and cache pressure this might cause by doing zero-copy memory management.
We're still dependant on the maximum playback buffer size the sound hardware supports. Many sound cards don't even support 2s, but only 300ms or suchlike.
The rewriting of the client buffers causing rewriting of the hardware buffer complicates the resampling/converting step immensly. In general the code to implement this model is more complex than for the traditional model. Also, ALSA has not really been designed with this design in mind, which makes some things very hard to get right and suboptimal.
Generally, this works reliably only on newest ALSA, newest kernel, newest everything. It has pretty steep requirements on software and sometimes even on hardware. To stay comptible with systems that don't fulfill these requirements we need to carry around code for the traditional playback model as well, increasing the code base by far.

The advantages of the scheme clearly outweigh the complexities it causes. Especially the power-saving features of glitch-free PA should be enough reason for the embedded Linux people to adopt it quickly. Make PA disappear from powertop even if you play music!

The code in the glitch-free is still rough and sometimes incomplete. I will merge it shortly into trunk and then upload a snapshot to Rawhide.

I hope this text also explains to the few remaining PA haters a little better why PA is a good thing, and why everyone should have it on his Linux desktop. Of course these changes are not visible on the surface, my hope with this blog story is to explain a bit better why infrastructure matters, and counter misconceptions what PA actually is and what it gives you on top of ALSA.

Updated PulseAudio Plugin for SDL

Quick update for all game kiddies: apply this patch to SDL and enjoy PulseAudio in your favourite SDL based game without buffering issues. It's basically just fixes the bogus buffer metrics of Stephan's original patch.