LWN covers Paul's and my talk at the Audio MC at LPC, Portland. (Subscribers only for now)
Update: Here's a free subscriber link.
Here are some very short notes from the Audio BoF at the Linux Plumbers Conference in Portland two weeks ago. Sorry for the delay!
Biggest issue discussed was audio routing. On embedded devices this gets more complex each day, and there are a lot of open questions on the desktop, too. Different DSP scenarios; how do mixer controls match up with PCM streams and jack sensing? How do we determine which volume control sliders that are in the pipeline we are currently interested in? How does that relate to policy decisions? Format to store audio routing in?
The ALSA scenario subsystem currently being worked on by Liam Girdwood and the folks at SlimLogic and currently on its way to being integrated into ALSA proper hopefully helps us, so that we can strip a lot of complexity related to the routing logic from PulseAudio and move it into a lower level which naturally knows more about the hardware's internal routing.
Does it make sense for some apps to bypass the ALSA userspace layer and to talk to the kernel drivers via ioctl()s directly?i (i.e. thus not depending on ALSA's LISP intepreter, and a lot of other complexities)? Probably yes, but certainly not in the short term future. Salsa? libsydney?
Should the timing deviation estimation/interpolation be moved from PulseAudio into the kernel? Might be a good idea. Particularly interesting when we try to to monitor not only the system and audio clocks, but the video output and particularly the video input (i.e. video4linux) clocks, too. A unified kernel-based timing system has advantages in accuracy, allows better handling of (pseudo-) atomic timing snapshots, and would centralize timing handling not only between different applications (PA and JACK) but also between different subsystems. Problem: current timing stuff in PulseAudio might be a bit too homegrown for moving it 1:1 into the kernel. Also, depends on FP. Needs someone to push this. Apple does the clock handling in the kernel. How does this relate to ALSA's timer API?
Seems Ubuntu is going to kill OSS pretty soon too, following Fedora's lead. Yay!
And that's all I have. Should be the biggest points raised. Ping me if I forgot something.
An often asked question is how to properly talk to PulseAudio from within applications where latency matters. To answer that question once and for all I've written this guide in our Wiki that should light things up a little. If you are interested in audio latency in PA, want to know how to minimize CPU usage and power consumption or how to maximize drop-out safety make sure to read this!
one small note: requiring copyright assignment from contributors, and putting your code in exotic VCSes that only a minority of potential contributors know or are willing to use is not helpful for attracting contributions -- right the contrary, it scares them away. Please fix that!
Last week I've been at the Linux Plumbers Conference in Portland. Like last year it kicked ass and proved again being one of the most relevant Linux developer conferences (if not the most relevant one). I ran the Audio MC at the conference which was very well attended. The slides for our four talks in the track are available online. (My own slides are probably a bit too terse for most readers, the interesting stuff was in the talking, not the reading...) Personally, for me the most interesting part was to see to which degree Nokia actually adopted PulseAudio in the N900. While I was aware that Nokia was using it, I wasn't aware that their use is as comprehensive as it turned out it is. And the industry support from other companies is really impressive too. After the main track we had a BoF session, which notes I'll post a bit later. Many thanks to Paul, Jyri, Pierre for their great talks. Unfortunately, Palm, the only manufacturer who is actually already shipping a phone with PulseAudio didn't send anyone to the conference who wanted to talk about that. Let's hope they'll eventually learn that just throwing code over the wall is not how Open Source works. Maybe they'll send someone to next year's LPC in Boston, where I hope to be able to do the Audio MC again.
Right now I am at the BlueZ Summit in Stuttgart. Among other things we have been discussing how to improve Bluetooth Audio support in PulseAudio. I guess one could say thet the Bluetooth support in PulseAudio is already one of its highlights, in fact working better then the support on other OSes (yay, that's an area where Linux Audio really shines!). So up next is better support for allowing PA to receive A2DP audio, i.e. making PA act as if it was a Headset or your hifi. Use case: send music from from your mobile to your desktop's hifi speakers. (Actually this is already support in current BlueZ/PA versions, but not easily accessible). Also Bluetooth headsets tend to support AC3 or MP3 decoding natively these days so we should support that in PA too. Codec handling has been on the TODO list for PA for quite some time, for the SPDIF or HDMI cases, and Bluetooth Audio is another reason why we really should have that.
Next week I'll be at the Maemo Summit in Amsterdam. Nokia kindly invited me. Unfortunately I was a bit too late to get a proper talk accepted. That said, I am sure if enough folks are interested we could do a little ad-hoc BoF and find some place at the venue for it. If you have any questions regarding PA just talk to me. The N900 uses PulseAudio for all things audio so I am quite sure we'll have a lot to talk about.
See you in Amsterdam!
One last thing: Check out Colin's work to improve integration of PulseAudio and KDE!
Tomorrow, Thu 24th 10 am, there's going to be an Audio BoF at LPC Portland, Salon E. Don't miss it.
A quick update on Skype: the next Skype version will include native PulseAudio support. And not only that but they even tag their audio streams properly. This enables PulseAudio to do fancy stuff like automatically pausing your audio playback when you have a phone call. Good job!
In some ways they are now doing a better job with integration in to the modern audio landscape than some Free Software telephony applications!
Unfortunately they didn't fix the biggest bug though: it's still not Free Software!
Here's a list of quick updates on my mutrace mutex profiler since my initial announcement two weeks ago:
I added some special support for tracking down use of mutexes in realtime threads. It's a very simple extension that -- if enabled -- checks on each mutex operation wheter it is executed by a realtime thread or not. (--track-rt) The output of a test run of this you can find in this announcement on LAD. Particularly interesting is that you can use this to track down which mutexes are good candidates for priority inheritance.
The mutrace tarball now also includes a companion tool matrace that can be used to track down memory allocation operations in realtime threads. See the same lad announcement as above for example output of this tool.
With help from Boudewijn Rempt I added some compatibility code for profiling C++/Qt apps with mutrace, which he already used for some interesting profiling results on krita.
Finally, after my comments on the locking hotspots in glib's type system, Wim Taymans and Edward Hervey worked on turning the mutex-emulated rwlocks into OS native ones with quite positive results, for more information see this bug.
As soon as my review request is fully processed mutrace will be available in rawhide.
A snapshot tarball of mutrace you may find here (despite the name of the tarball that's just a snapshot, not the real release 0.1), for all those folks who are afraid of git, or don't have a current autoconf/automake/libtool installed.
When naively profiling multi-threaded applications the time spent waiting for mutexes is not necessarily visible in the generated output. However lock contention can have a big impact on the runtime behaviour of applications. On Linux valgrind's drd can be used to track down mutex contention. Unfortunately running applications under valgrind/drd slows them down massively, often having the effect of itself generating many of the contentions one is trying to track down. Also due to its slowness it is very time consuming work.
To improve the situation if have now written a mutex profiler called mutrace. In contrast to valgrind/drd it does not virtualize the CPU instruction set, making it a lot faster. In fact, the hooks mutrace relies on to profile mutex operations should only minimally influence application runtime. mutrace is not useful for finding synchronizations bugs, it is solely useful for profiling locks.
Now, enough of this introductionary blabla. Let's have a look on the data mutrace can generate for you. As an example we'll look at gedit as a bit of a prototypical Gnome application. Gtk+ and the other Gnome libraries are not really known for their heavy use of multi-threading, and the APIs are generally not thread-safe (for a good reason). However, internally subsytems such as gio do use threading quite extensibly. And as it turns out there are a few hotspots that can be discovered with mutrace:
$ LD_PRELOAD=/home/lennart/projects/mutrace/libmutrace.so gedit mutrace: 0.1 sucessfully initialized.
gedit is now running and its mutex use is being profiled. For this example I have now opened a file with it, typed a few letters and then quit the program again without saving. As soon as gedit exits mutrace will print the profiling data it gathered to stderr. The full output you can see here. The most interesting part is at the end of the generated output, a breakdown of the most contended mutexes:
mutrace: 10 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Type 35 368268 407 275 120,822 0,000 0,894 normal 5 234645 100 21 86,855 0,000 0,494 normal 26 177324 47 4 98,610 0,001 0,150 normal 19 55758 53 2 23,931 0,000 0,092 normal 53 106 73 1 0,769 0,007 0,160 normal 25 15156 70 1 6,633 0,000 0,019 normal 4 973 10 1 4,376 0,004 0,174 normal 75 68 62 0 0,038 0,001 0,004 normal 9 1663 52 0 1,068 0,001 0,412 normal 3 136553 41 0 61,408 0,000 0,281 normal ... ... ... ... ... ... ... ... mutrace: Total runtime 9678,142 ms.
(Sorry, LC_NUMERIC was set to de_DE.UTF-8, so if you can't make sense of all the commas, think s/,/./g!)
For each mutex a line is printed. The 'Locked' column tells how often the mutex was locked during the entire runtime of about 10s. The 'Changed' column tells us how often the owning thread of the mutex changed. The 'Cont.' column tells us how often the lock was already taken when we tried to take it and we had to wait. The fifth column tell us for how long during the entire runtime the lock was locked, the sixth tells us the average lock time, and the seventh column tells us the longest time the lock was held. Finally, the last column tells us what kind of mutex this is (recursive, normal or otherwise).
The most contended lock in the example above is #35. 275 times during the runtime a thread had to wait until another thread released this mutex. All in all more then 120ms of the entire runtime (about 10s) were spent with this lock taken!
In the full output we can now look up which mutex #35 actually is:
Mutex #35 (0x0x7f48c7057d28) first referenced by: /home/lennart/projects/mutrace/libmutrace.so(pthread_mutex_lock+0x70) [0x7f48c97dc900] /lib64/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x6a) [0x7f48c674a03a] /lib64/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4b) [0x7f48c6e38ddb] /usr/lib64/libgdk-x11-2.0.so.0(gdk_pre_parse_libgtk_only+0x8c) [0x7f48c853171c] /usr/lib64/libgtk-x11-2.0.so.0(+0x14b31f) [0x7f48c891831f] /lib64/libglib-2.0.so.0(g_option_context_parse+0x90) [0x7f48c67308e0] /usr/lib64/libgtk-x11-2.0.so.0(gtk_parse_args+0xa1) [0x7f48c8918021] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init_check+0x9) [0x7f48c8918079] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init+0x9) [0x7f48c89180a9] /usr/bin/gedit(main+0x166) [0x427fc6] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f48c5b42b4d] /usr/bin/gedit() [0x4276c9]
As it appears in this Gtk+ program the rwlock type_rw_lock (defined in glib's gobject/gtype.c) is a hotspot. GLib's rwlocks are implemented on top of mutexes, so an obvious attempt in improving this could be to actually make them use the operating system's rwlock primitives.
If a mutex is used often but only ever by the same thread it cannot starve other threads. The 'Changed.' column lists how often a specific mutex changed the owning thread. If the number is high this means the risk of contention is also high. The 'Cont.' column tells you about contention that actually took place.
Due to the way mutrace works we cannot profile mutexes that are used internally in glibc, such as those used for synchronizing stdio and suchlike.
mutrace is implemented entirely in userspace. It uses all kinds of exotic GCC, glibc and kernel features, so you might have a hard time compiling and running it on anything but a very recent Linux distribution. I have tested it on Rawhide but it should work on slightly older distributions, too.
Make sure to build your application with -rdynamic to make the backtraces mutrace generates useful.
As of now, mutrace only profiles mutexes. Adding support for rwlocks should be easy to add though. Patches welcome.
The output mutrace generates can be influenced by various MUTRACE_xxx environment variables. See the sources for more information.
And now, please take mutrace and profile and speed up your application!