When naively profiling multi-threaded applications the time spent waiting for mutexes is not necessarily visible in the generated output. However lock contention can have a big impact on the runtime behaviour of applications. On Linux valgrind's drd can be used to track down mutex contention. Unfortunately running applications under valgrind/drd slows them down massively, often having the effect of itself generating many of the contentions one is trying to track down. Also due to its slowness it is very time consuming work.
To improve the situation if have now written a mutex profiler called mutrace. In contrast to valgrind/drd it does not virtualize the CPU instruction set, making it a lot faster. In fact, the hooks mutrace relies on to profile mutex operations should only minimally influence application runtime. mutrace is not useful for finding synchronizations bugs, it is solely useful for profiling locks.
Now, enough of this introductionary blabla. Let's have a look on the data mutrace can generate for you. As an example we'll look at gedit as a bit of a prototypical Gnome application. Gtk+ and the other Gnome libraries are not really known for their heavy use of multi-threading, and the APIs are generally not thread-safe (for a good reason). However, internally subsytems such as gio do use threading quite extensibly. And as it turns out there are a few hotspots that can be discovered with mutrace:
$ LD_PRELOAD=/home/lennart/projects/mutrace/libmutrace.so gedit mutrace: 0.1 sucessfully initialized.
gedit is now running and its mutex use is being profiled. For this example I have now opened a file with it, typed a few letters and then quit the program again without saving. As soon as gedit exits mutrace will print the profiling data it gathered to stderr. The full output you can see here. The most interesting part is at the end of the generated output, a breakdown of the most contended mutexes:
mutrace: 10 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Type 35 368268 407 275 120,822 0,000 0,894 normal 5 234645 100 21 86,855 0,000 0,494 normal 26 177324 47 4 98,610 0,001 0,150 normal 19 55758 53 2 23,931 0,000 0,092 normal 53 106 73 1 0,769 0,007 0,160 normal 25 15156 70 1 6,633 0,000 0,019 normal 4 973 10 1 4,376 0,004 0,174 normal 75 68 62 0 0,038 0,001 0,004 normal 9 1663 52 0 1,068 0,001 0,412 normal 3 136553 41 0 61,408 0,000 0,281 normal ... ... ... ... ... ... ... ... mutrace: Total runtime 9678,142 ms.
(Sorry, LC_NUMERIC was set to de_DE.UTF-8, so if you can't make sense of all the commas, think s/,/./g!)
For each mutex a line is printed. The 'Locked' column tells how often the mutex was locked during the entire runtime of about 10s. The 'Changed' column tells us how often the owning thread of the mutex changed. The 'Cont.' column tells us how often the lock was already taken when we tried to take it and we had to wait. The fifth column tell us for how long during the entire runtime the lock was locked, the sixth tells us the average lock time, and the seventh column tells us the longest time the lock was held. Finally, the last column tells us what kind of mutex this is (recursive, normal or otherwise).
The most contended lock in the example above is #35. 275 times during the runtime a thread had to wait until another thread released this mutex. All in all more then 120ms of the entire runtime (about 10s) were spent with this lock taken!
In the full output we can now look up which mutex #35 actually is:
Mutex #35 (0x0x7f48c7057d28) first referenced by: /home/lennart/projects/mutrace/libmutrace.so(pthread_mutex_lock+0x70) [0x7f48c97dc900] /lib64/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x6a) [0x7f48c674a03a] /lib64/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4b) [0x7f48c6e38ddb] /usr/lib64/libgdk-x11-2.0.so.0(gdk_pre_parse_libgtk_only+0x8c) [0x7f48c853171c] /usr/lib64/libgtk-x11-2.0.so.0(+0x14b31f) [0x7f48c891831f] /lib64/libglib-2.0.so.0(g_option_context_parse+0x90) [0x7f48c67308e0] /usr/lib64/libgtk-x11-2.0.so.0(gtk_parse_args+0xa1) [0x7f48c8918021] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init_check+0x9) [0x7f48c8918079] /usr/lib64/libgtk-x11-2.0.so.0(gtk_init+0x9) [0x7f48c89180a9] /usr/bin/gedit(main+0x166) [0x427fc6] /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f48c5b42b4d] /usr/bin/gedit() [0x4276c9]
As it appears in this Gtk+ program the rwlock type_rw_lock (defined in glib's gobject/gtype.c) is a hotspot. GLib's rwlocks are implemented on top of mutexes, so an obvious attempt in improving this could be to actually make them use the operating system's rwlock primitives.
If a mutex is used often but only ever by the same thread it cannot starve other threads. The 'Changed.' column lists how often a specific mutex changed the owning thread. If the number is high this means the risk of contention is also high. The 'Cont.' column tells you about contention that actually took place.
Due to the way mutrace works we cannot profile mutexes that are used internally in glibc, such as those used for synchronizing stdio and suchlike.
mutrace is implemented entirely in userspace. It uses all kinds of exotic GCC, glibc and kernel features, so you might have a hard time compiling and running it on anything but a very recent Linux distribution. I have tested it on Rawhide but it should work on slightly older distributions, too.
Make sure to build your application with -rdynamic to make the backtraces mutrace generates useful.
As of now, mutrace only profiles mutexes. Adding support for rwlocks should be easy to add though. Patches welcome.
The output mutrace generates can be influenced by various MUTRACE_xxx environment variables. See the sources for more information.
And now, please take mutrace and profile and speed up your application!