Finally, here's my linux.conf.au 2007 and FOMS 2007
recap. Maybe a little bit late, but better late then never.
FOMS was a very well organized conference with a packed schedule
and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the
attendees without any opposition (at least none was expressed
aloud). After a few "discussions" on a few mailing lists (including
GNOME MLs) and some personal emails I got, I had thought that more
people were in opposition of the idea of having a userspace sound
daemon for the desktop. Apparently, I was overly pessimistic. Good
news, that!
During the FOMS conference we discussed the problems audio on Linux
currently has. One of the major issues still is that we're lacking a
cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific
and complicated to use. The only real contender is PortAudio. However,
PortAudio has its share of problems and hasn't reach wide adoption
yet. Right now most larger software projects implement an audio
abstraction layer of some kind, and mostly in a very dirty, simplistic
and limited fasion. MPlayer does, Xine does it, Flash does
it. Everyone does it, and it sucks. (Note: this is only a very short
overview why audio on Linux sucks right now. For a longer one, please
have a look on the first 15mins of my PulseAudio talk at LCA, linked
below.)
Several people were asking why not to make the PulseAudio API the
new "standard" PCM API for Linux. Due to several reasons that would be a
bad idea. First of all, the PulseAudio API cannot be used on anything
else but PulseAudio. While PulseAudio has been ported to Win32, Vista
already has a userspace desktop sound server, hence running PulseAudio
on top of that doesn't make much sense. Thus the API is not exactly
cross-platform. Secondly, I - as the guy who designed it - am not
happy with the current PulseAudio API. While it is very powerful it is
also very difficult to use and easy to misuse, mostly due to its fully
asynchronous nature. In addition it is also not the exactly smallest
API around.
So, what could be done about this? We agreed on a - maybe -
controversional solution: defining yet another abstracted PCM audio
API. Yes, fixing the problem that we have too many conflicting,
competing sound systems by defining yet another API sounds like a
paradoxon, but I do believe this is the right path to follow. Why?
Because none of the currently available solutions is suitable for all
application areas we have on Linux. Either the current APIs are not
portable, or they are horribly difficult to use properly, or have a
strange license, or are too simple in their functionality. MacOSX
managed to establish a single audio API (CoreAudio) that makes almost
everyone happy on that system - and we should be able to do same for
Linux. Secondly, none of the current APIs has been designed with
network sound servers in mind. However, proper networking support
reflects back into the API, and in a non-trivial way. An API which
works fine in networked environment needs to eliminate roundtrips
where possible, be open for time interpolation and have a flexible
buffering (besides other minor things). Thirdly none of the current
APIs offers enough functionality to properly support all the needs of
modern desktop sound systems, such as per-stream volumes, stream names
and notifications about external state changes.
During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin
(from Xiph) and I sat down and designed a draft API for the
functionality we would like to see in this API. For the time being we
dubbed it libsydney, after the city where we started this
project. I plan to make this the only supported audio API for
PulseAudio, eventually. Thus, if you will code against PulseAudio you
will get cross-platform support for free. In addition, because
PulseAudio is now being integrated into the major distributions (at
least Ubuntu and Fedora), this library will be made available on most
systems through the backdoor.
So, what will this new API offer? Firstly, the buffering model is
much more powerful than of any current sound API. The buffering model
mostly follows PulseAudio's internal buffering model which
(theoretically) can offer zero-latency streaming and has been
pioneered by Jim Gettys' AF sound server. It allows you to seek around
in the playback buffer very flexibly. This is very useful to allow
very fast reaction to the user's playback control commands while still
allowing large buffers, which are good to deal with high network
lag. In addition it is very handy for the programmer, such as when
implementing streaming clients where packets may arrive
out-of-order. The API will emulate this buffering model on top of
traditional audio devices, and when used on top of PulseAudio it will use
its native implementation. The API will also clearly define which
sound formats are guaranteed to be available, thus making it a lot
easier to code without thinking of different hardware supporting
different formats all the time. Of course, the API will be easier to
use than PulseAudio's current API. It will be very portable, scaling
from FPU-less architectures to pro-audio machines with a massive
number of synchronised channels. There are several modes available to
deal with XRUNs semi-automatically, one of them guaranteeing that the
time axis stays linear and monotonical in all events.
The list of features of this new API is much longer, however,
enough of these grand plans! We didn't write any real code for this
yet. To make sure that this project is not another one of those which
are announced grandiosely without ever producing any code I will stop
listing features here now. We will eventually publish a first draft of
our C API for public discussion. Stay tuned.
Side-by-side with libsydney I discussed an abstract API
for desktop event sounds with Mikko (i.e. those annoying "bing" sounds
when you click a button and the like). Dubbed libcanberra
(named after the city which one of the developers visited after
Sydney), this will hopefully be for the PulseAudio sample cache API
what libsydney is for the PulseAudio streaming API: a total
replacement.
As a by-product of the libsydney discussion Jean-Marc
coded a
fast C resampling library supporting both floating point and fixed
point and being licensed under BSD. (In contrast to
libsamplerate which is GPL and floating-point-only, but which
probably has better quality). PulseAudio will make use of this new
library, as will libsydney. And I sincerly hope that ALSA,
GStreamer and other projects replace their crappy home-grown
resamplers with this one!
For PulseAudio I was looking for a CODEC which we could use to
encode audio if we have to transfer it over the network. Such a CODEC
would need to have low CPU requirements and allow low-latency
operation, while providing hifi audio. Compression ratio is not such a
high requirement. Unfortunately, as it seems no such CODEC exists,
especially not a "Free" one. However, the Xiph people recommended to
hack up a special version of FLAC for this task. FLAC is fast, has
(obviously) good quality and if hacked up could provide low-latency
encoding. However, FLAC doesn't compress that well. Current PulseAudio
thin-client installations require 170kB network bandwidth for each
client if hifi audio is used. Encoding this in FLAC this could cut
this in half. Not perfect, but better than nothing.
So, that was FOMS! FOMS is a definitely highly recommended
conference. If you have the chance to attend next year, don't miss it!
I've never been to a more productive, packed conference in my life!
At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our
talk about Avahi went very well. During my flights to and back from
.au I hacked up avahi-ui
which I also announced during that talk. Also, in related news,
tedp started to work on an implementation of NAT-PMP
(aka "reverse firewall piercing"; both client and server) for
inclusion in Avahi. This will hopefully make the upcoming Wide-Area
DNS support in Avahi much more useful.
linux.conf.au was a very exciting conference. As a speaker
you're treated like a rock star, with stuff like the speakers dinner,
the speakers adventure (climbing on top of Sydney's AMP tower) and
the penguin dinner. Heck, the organizers even picked me up at the
airport, something I really didn't expect when I landed in Sydney,
which however is quite nice after a 27h flight.
Two talks I particularly enjoyed at LCA:
And just for the sake of completeness, here are the links to my presentations:
Ok, that's it for now. Thanks go to Silvia Pfeiffer, the rest of
the FOMS team and the Seven Team for organizing these two amazing
conferences!