Category: projects

All About Fragments

In my on-going series Writing Better Audio Applications for Linux, here's another installment: a little explanation how fragments/periods and buffer sizes should be chosen when doing audio playback with traditional audio APIs such as ALSA and OSS. This originates from some emails I exchanged with the Ekiga folks. In the last weeks I kept copying this explanation to various other folks. I guess it would make sense to post this on my blog here too to reach a wider audience. So here it is, mostly unedited:

Yes. You shouldn't misuse the fragments logic of sound devices. It's
like this:

   The latency is defined by the buffer size.
   The wakeup interval is defined by the fragment size.

The buffer fill level will oscillate between 'full buffer' and 'full
buffer minus 1x fragment size minus OS scheduling latency'. Setting
smaller fragment sizes will increase the CPU load and decrease battery
time since you force the CPU to wake up more often. OTOH it increases
drop out safety, since you fill up playback buffer earlier. Choosing
the fragment size is hence something which you should do balancing out
your needs between power consumption and drop-out safety. With modern
processors and a good OS scheduler like the Linux one setting the
fragment size to anything other than half the buffer size does not
make much sense.

Your [Ekiga's ptlib driver that is] ALSA output is configured
to set the the fragment size to the size of your codec audio
frames. And that's a bad idea. Because the codec frame size has not
been chosen based on power consumption or drop-out safety
reasoning. It has been chosen by the codec designers based on
different reasoning, such as latency.

You probably configured your backend this ways because the ALSA
library docs say that it is recommended to write to the sound card in
multiples of the fragment size. However deducing from this that you
hence should configure the fragment size to the codec frame size is
wrong!

The best way to implement playback these days for ALSA is to write as
much as snd_pcm_avail() tells you to each time you wake up due to
POLLOUT on the sound card. If that is not a multiple of your codec
frame size then you need to buffer the the remainder of the decoded
data yourself in system memory.

The ALSA fragment size you should normally set as large as possible
given your latency constraints but that you have at least two
fragments in your buffer size.

I hope this explains a bit how frag_size/buffer_size should be
chosen. If you have questions, just ask.

(Oh, ALSA uses the term 'period' for what I call 'fragment'
above. It's synonymous)

GNOME now esound-free

Andre Klapper just informed me that GNOME is now officially esound-free: all modules have been ported over to libcanberra for event sounds or GStreamer/PulseAudio for everything else. It's time to celebrate!

It's an end of an era. The oldest version of esound in GNOME CVS is 0.2.1, commited on May 11th 1998. It has been shipped with every GNOME release since 1.0 back in 1999. (esound outside of GNOME dates even further back, probably some time in the year 1997 or so). After almost 11 years in GNOME it's all over now. Oh, those were the good times.

If you maintain a module that is not part of GNOME that still uses esound, hurry and update yours as well!


What YOU need to know about Practical Real-Time Programming

Eduardo Lima just added a couple of more videos from one of the best conferences in existence to the OpenBOSSA channel at blip.tv. Humbly as I am I'd like to ask everyone who is interested in real-time and/or audio/video/animation programming to have a peek at this particular one.

That's all.


Device Reservation Spec

The JACK folks and I have agreed on a little specification for device reservation that allows clean hand-over of audio device access from PulseAudio to JACK and back. The specification is generic enough to allow locking/hand-over of other device types as well, not just audio cards. So, in case someone needs to implement a similar kind of locking/handover for any kind of resource here's some prior art you can base your work on. Given that HAL is supposed to go away pretty soon this might be an option for a replacement for HAL's current device locking. The logic is as simple as it can get. Whoever owns a certain service name on the D-Bus session bus owns the device access. For further details, read the spec.

There's even a reference implementation available, which both JACK2 and PulseAudio have now integrated.

Also known as PAX SOUND SERVERIS.


Having fun with bzr

#nocomments y

So I wanted to hack proper channel mapping query support into libsndfile, something I have had on my TODO list for years. The first step was to find the source code repository for it. That was easy. Alas the VCS used is bzr. There are some very vocal folks on the Internet who claim that the bzr user interface is stupendously easy to use in contrast to git which apparantly is the very definition of complexity. And if it is stated on the Internet it must be true. I think I mastered git quite well, so yeah, checking out the sources with bzr can't be that difficult for my limited brain capacity.

So let's do what Erik suggests for checking out the sources:

$ bzr get http://www.mega-nerd.com/Bzr/libsndfile-pub/

Calling this I get a nice percentage counter that starts at 0% and ends at, ... uh, 0%. That gives me a real feeling of progress. It takes a while, and then I get an error:

bzr: ERROR: Not a branch: "http://www.mega-nerd.com/Bzr/libsndfile-pub/".

Now that's a useful error message. They even include an all-caps word! I guess that error message is right -- it's not a branch, it is a repository. Or is it not?

So what do we do about this? Maybe get is not actually the right verb. Let's try to play around a bit. Let's use the verb I use to get sources with in git:

$ bzr clone http://www.mega-nerd.com/Bzr/libsndfile-pub/

Hmm, this results in exactly same 0% to 0% progress counter, and the same useless error message.

Now I remember that bzr is actually more inspired by Subversion's UI than by git's, so let's try it the SVN way.

$ bzr checkout http://www.mega-nerd.com/Bzr/libsndfile-pub/

Hmm, and of course, I get exactly the same results again. A counter that counts from 0% to 0% and the same useless error message.

Ok, maybe that error is bzr's standard reply? Let's check this out:

$ bzr waldo http://www.mega-nerd.com/Bzr/libsndfile-pub/
bzr: ERROR: unknown command "waldo"

Apparently not. bzr actually knows more than one error message.

Ok, I admit doing this by trial-and-error is a rather lame approach. RTFM! So let's try this.

$ man bzr-get
No manual entry for bzr-get

Ouch. No man page? How awesome. Ah, wait, maybe they have only a single unreadable mega man page for everything. Let's try this:

$ man bzr

Wow, this actually worked. Seems to list all commands. Now let's look for the help on bzr get:

/bzr get
Pattern not found  (press RETURN)

Hmm, no documentation for their most important command? That's weird! Ok, let's try it again with our git vocabulary:

/bzr clone
Pattern not found  (press RETURN)

Ok, this not funny anymore. Apparently the verbs are listed in alphabetical order. So let's browse to the letter g as in get. However it doesn't exist. There's bzr export, and then the next entry is bzr help (Oh, irony!) -- but no get in-between.

Ok, enough of this shit. Maybe the message wants to tell us that the repo actually doesn't exist (even though it confusingly calls it a "branch"). Let's go back to the original page at Erik's site and read things again. Aha, the "main archive archive can be found at (yes, the directory looks empty, but it isn't): http://www.mega-nerd.com/Bzr/libsndfile-pub/". Hmm, indeed -- that URL looks very empty when it is accessed. How weird though that in bzr a repo is an empty directory!

And at this point I gave up and downloaded the tarball to make my patches against. I have still not managed to check out the sources from the repo. Somehow I get the feeling the actual repo really isn't available anymore under that address.

So why am I blogging about this? Not so much to start another flamefest, to nourish the fanboys, nor because it is so much fun to bash other people's work or simply to piss people off. It's more for two reasons:

Firstly, simply to make the point that folks can claim a thousand times that git's UI sucks and bzr's UI is awesome. It's simply not true. From what I experienced it is not the tiniest bit better. The error messages useless, the documentation incomplete, the interfaces surprising and exactly as redundant as git's. The only effective difference I noticed is that it takes a bit longer to show those error messages with bzr -- the Python tax. To summarize this more positively: git excels as much as bzr does. Both' documentation, their error messages and their user interface are the best in their class. And they have all the best chances for future improvement.

And the second reason of course is that I'd still like to know what the correct way to get the sources is. But for that I should probably ask Erik himself.


Generating Copyright Headers from git History

Here's a little a little tool I wrote that automatically generates copyright headers for source files in a git repository based on the git history.

Run it like this:

~/projects/pulseaudio$ copyright.py src/pulsecore/sink.c src/pulsecore/core-util.c

And it will give you this:

File: src/pulsecore/sink.c
	Copyright 2004, 2006-2009 Lennart Poettering
	Copyright 2006-2007 Pierre Ossman
	Copyright 2008-2009 Marc-Andre Lureau
File: src/pulsecore/core-util.c
	Copyright 2004, 2006-2009 Lennart Poettering
	Copyright 2006-2007 Pierre Ossman
	Copyright 2008 Stelian Ionescu
	Copyright 2009 Jared D. McNeill
	Copyright 2009 Marc-Andre Lureau

This little script could use love from a friendly soul to make it crawl entire source trees and patch in appropriate copyright headers. Anyone up for it?


Tagging Audio Streams

So you are hacking an audio application and the audio data you are generating might eventually end up in PulseAudio before it is played. If that's the case then please make sure to read this!

Here's the quick summary for Gtk+ developers:

PulseAudio can enforce all kinds of policy on sounds. For example, starting in 0.9.15, we will automatically pause your media player while a phone call is going on. To implement this we however need to know what the stream you are sending to PulseAudio should be categorized as: is it music? Is it a movie? Is it game sounds? Is it a phone call stream?

Also, PulseAudio would like to show a nice icon and an application name next to each stream in the volume control. That requires it to be able to deduce this data from the stream.

And here's where you come into the game: please add three lines like the following next to the beginning of your main() function to your Gtk+ application:

...
g_set_application_name(_("Totem Movie Player"));
gtk_window_set_default_icon_name("totem");
g_setenv("PULSE_PROP_media.role", "video", TRUE);
...

If you do this then the PulseAudio client libraries will be able to figure out the rest for you.

There is more meta information (aka "properties") you can set for your application or for your streams that is useful to PulseAudio. In case you want to know more about them or you are looking for equivalent code to the above example for non-Gtk+ applications, make sure to read the mentioned page.

Thank you!

Oh, and even if your app doesn't do audio, calling g_set_application_name() and gtk_window_set_default_icon_name() is always a good idea!


How to Version D-Bus Interfaces Properly and Why Using / as Service Entry Point Sucks

So you are designing a D-Bus interface and want to make it future-proof. Of course, you thought about versioning your stuff. But you wonder how to do that best. Here are a few things I learned about versioning D-Bus APIs which might be of general interest:

Version your interfaces! This one is pretty obvious. No explanation needed. Simply include the interface version in the interface name as suffix. i.e. the initial release should use org.foobar.AwesomeStuff1, and if you do changes you should introduce org.foobar.AwesomeStuff2, and so on, possibly dropping the old interface.

When should you bump the interface version? Generally, I'd recommend only bumping when doing incompatible changes, such as function call signature changes. This of course requires clients to handle the org.freedesktop.DBus.Error.UnknownMethod error properly for each function you add to an existing interface. That said, in a few cases it might make sense to bump the interface version even without breaking compatibility of the calls. (e.g. in case you add something to an interface that is not directly visible in the introspection data)

Version your services! This one is almost as obvious. When you completely rework your D-Bus API introducing a new service name might be a good idea. Best way to do this is by simply bumping the service name. Hence, call your service org.foobar.AwesomeService1 right from the beginning and then bump the version if you reinvent the wheel. And don't forget that you can acquire more than one well-known service name on the bus, so even if you rework everything you can keep compatibilty. (Example: BlueZ 3 to BlueZ 4 switch)

Version your 'entry point' object paths! This one is far from obvious. The reasons why object paths should be versioned are purely technical, not philosophical: for signals sent from a service D-Bus overwrites the originating service name by the unique name (e.g. :1.42) even if you fill in a well-known name (e.g. org.foobar.AwesomeService1). Now, let's say your application registers two well-known service names, let's say two versions of the same service, versioned like mentioned above. And you have two objects -- one on each of the two service names -- that implement a generic interface and share the same object path: for the client there will be no way to figure out to which service name the signals sent from this object path belong. And that's why you should make sure to use versioned and hence different paths for both objects. i.e. start with /org/foobar/AwesomeStuff1 and then bump to /org/foobar/AwesomeStuff2 and so on. (Also see David's comments about this.)

When should you bump the object path version? Probably only when you bump the service name it belongs to. Important is to version the 'entry point' object path. Objects below that don't need explicit versioning.

In summary: For good D-Bus API design you should version all three: D-Bus interfaces, service names and 'entry point' object paths.

And don't forget: nobody gets API design right the first time. So even if you think your D-Bus API is perfect: version things right from the beginning because later on it might turn out you were not quite as bright as you thought you were.

A corollary from the reasoning behind versioning object paths as described above is that using / as entry point object path for your service is a bad idea. It makes it very hard to implement more than one service or service version on a single D-Bus connection. Again: Don't use / as entry point object path. Use something like /org/foobar/AwesomeStuff!


Writing Volume Control UIs is Hard

Writing modern volume control UIs (i.e. 'mixer tools') is much harder to get right than it might appear at first. Because that is the way it is I've put together a rough guide what to keep in mind when writing them for PulseAudio. Originally just intended to be a bit of help for the gnome-volume-control guys I believe this could be an interesting read for other people as well.

It touches a lot of topics: volumes in general, how to present them, what to present, base volumes, flat volumes, what to do about multichannel volumes, controlling clients, controlling cards, handling default devices, saving/restoring volumes/devices, sound event sliders, how to monitor PCM and more.

So make sure to give it at least a quick peek! If you plan to write a volume control for ncurses or KDE (hint, hint!) even more so, it's a must read.

Maybe this might also help illustrating why I think that abstracting volume control interfaces inside of abstraction layers such as Phonon or GStreamer is doomed to fail, and just not even worth the try.

And now, without further ado I give you 'Writing Volume Control UIs'.


Oh Nine Fifteen

Last week I've released a test version for the upcoming 0.9.15 release of PulseAudio. It's going to be a major one, so here's a little overview what's new from the user's perspective.

Flat Volumes

Based on code originally contributed by Marc-André Lureau we now support Flat Volumes. The idea behind flat volumes has been inspired by how Windows Vista handles volume control: instead of maintaining one volume control per application stream plus one device volume we instead fix the device volume automatically to the "loudest" application stream volume. Sounds confusing? Actually it's right the contrary, it feels pretty natural and easy to use and brings us a big step forward to reduce a bit the number of volume sliders in the entire audio pipeline from the application to what you hear.

The flat volumes logic only applies to devices where we know the actual multiplication factor of the hardware volume slider. That's most devices supported by the ALSA kernel drivers except for a few older devices and some cheap USB hardware that exports invalid dB information.

On-the-fly Reconfiguration of Devices (aka "S/PDIF Support")

PulseAudio will now automatically probe all possible combinations of configurations how to use your sound card for playback and capturing and then allow on-the-fly switching of the configuration. What does that mean? Basically you may now switch beetween "Analog Stereo", "Digital S/PDIF Stereo", "Analog Surround 5.1" (... and so on) on-the-fly without having to reconfigure PA on the configuration file level or even having to stop your streams. This fixes a couple of issues PA had previously, including proper SPDIF support, and per-device configuration of the channel map of devices.

Unfortunately there is no UI for this yet, and hence you need to use pactl/pacmd on the command line to switch between the profiles. Typing list-cards in pacmd will tell you which profiles your card supports.

In a later PA version this functionality will be extended to also allow input connector switching (i.e. microphone vs. line-in) and output connector switching (i.e. internal speakers vs. line-out) on-the-fly.

Native support for 24bit samples

PA now supports 24bit packed samples as well as 24bit stored in the LSBs of 32bit integers natively. Previously these formats were always converted into 32bit MSB samples.

Airport Express Support

Colin Guthrie contributed native Airport Express support. This will make the RAOP audio output of ApEx routers appear like local sound devices (unfortunately sound devices with a very long latency), i.e. any application connecting to PulseAudio can output audio to ApEx devices in a similar way to how iTunes can do it on MacOSX.

Before you ask: it is unlikely that we will ever make PulseAudio be able to act as an ApEx compatible device that takes connections from iTunes (i.e. becoming a RAOP server instead of just an RAOP client). Apple has an unfriendly attitude of dongling their devices to their applications: normally iTunes has to cryptographically authenticate itself to the device and the device to iTunes. iTunes' key has been recovered by the infamous Jon Lech Johansen, but the device key is still unknown. Without that key it is not realistically possible to disguise PA as an ApEx.

Other stuff

There have been some extensive changes to natively support Bluetooth audio devices well by directly accessing BlueZ. This code was originally contributed by the GSoC student João Paulo Rechi Vita. Initially, 0.9.15 was intended to become the version were BT audio just works. Unfortunately the kernel is not really up to that yet, and I am not sure everything will be in place so that 0.9.15 will ship with well working BT support.

There have been a lot of internal changes and API additions. Most of these however are not visible to the user.

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .