When naively profiling multi-threaded applications the time spent waiting
for mutexes is not necessarily visible in the generated output. However lock
contention can have a big impact on the runtime behaviour of applications. On
Linux valgrind's
drd can be used to track down mutex contention. Unfortunately running
applications under valgrind/drd slows them down massively, often having the
effect of itself generating many of the contentions one is trying to track
down. Also due to its slowness it is very time consuming work.
To improve the situation if have now written a mutex profiler called
mutrace. In contrast to valgrind/drd it does not virtualize the
CPU instruction set, making it a lot faster. In fact, the hooks mutrace
relies on to profile mutex operations should only minimally influence
application runtime. mutrace is not useful for finding
synchronizations bugs, it is solely useful for profiling locks.
Now, enough of this introductionary blabla. Let's have a look on the data
mutrace can generate for you. As an example we'll look at
gedit as a bit of a prototypical Gnome application. Gtk+ and the other
Gnome libraries are not really known for their heavy use of multi-threading,
and the APIs are generally not thread-safe (for a good reason). However,
internally subsytems such as gio do use threading quite extensibly.
And as it turns out there are a few hotspots that can be discovered with
mutrace:
$ LD_PRELOAD=/home/lennart/projects/mutrace/libmutrace.so gedit
mutrace: 0.1 sucessfully initialized.
gedit is now running and its mutex use is being profiled. For this example
I have now opened a file with it, typed a few letters and then quit the program
again without saving. As soon as gedit exits mutrace will print the
profiling data it gathered to stderr. The full output you can see
here. The most interesting part is at the end of the generated output, a
breakdown of the most contended mutexes:
mutrace: 10 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Type
35 368268 407 275 120,822 0,000 0,894 normal
5 234645 100 21 86,855 0,000 0,494 normal
26 177324 47 4 98,610 0,001 0,150 normal
19 55758 53 2 23,931 0,000 0,092 normal
53 106 73 1 0,769 0,007 0,160 normal
25 15156 70 1 6,633 0,000 0,019 normal
4 973 10 1 4,376 0,004 0,174 normal
75 68 62 0 0,038 0,001 0,004 normal
9 1663 52 0 1,068 0,001 0,412 normal
3 136553 41 0 61,408 0,000 0,281 normal
... ... ... ... ... ... ... ...
mutrace: Total runtime 9678,142 ms.
(Sorry, LC_NUMERIC was set to de_DE.UTF-8, so if you can't make sense of
all the commas, think s/,/./g!)
For each mutex a line is printed. The 'Locked' column tells how often the
mutex was locked during the entire runtime of about 10s. The 'Changed' column
tells us how often the owning thread of the mutex changed. The 'Cont.' column
tells us how often the lock was already taken when we tried to take it and we
had to wait. The fifth column tell us for how long during the entire runtime
the lock was locked, the sixth tells us the average lock time, and the seventh
column tells us the longest time the lock was held. Finally, the last column
tells us what kind of mutex this is (recursive, normal or otherwise).
The most contended lock in the example above is #35. 275 times during the
runtime a thread had to wait until another thread released this mutex. All in
all more then 120ms of the entire runtime (about 10s) were spent with this
lock taken!
In the full output we can now look up which mutex #35 actually is:
Mutex #35 (0x0x7f48c7057d28) first referenced by:
/home/lennart/projects/mutrace/libmutrace.so(pthread_mutex_lock+0x70) [0x7f48c97dc900]
/lib64/libglib-2.0.so.0(g_static_rw_lock_writer_lock+0x6a) [0x7f48c674a03a]
/lib64/libgobject-2.0.so.0(g_type_init_with_debug_flags+0x4b) [0x7f48c6e38ddb]
/usr/lib64/libgdk-x11-2.0.so.0(gdk_pre_parse_libgtk_only+0x8c) [0x7f48c853171c]
/usr/lib64/libgtk-x11-2.0.so.0(+0x14b31f) [0x7f48c891831f]
/lib64/libglib-2.0.so.0(g_option_context_parse+0x90) [0x7f48c67308e0]
/usr/lib64/libgtk-x11-2.0.so.0(gtk_parse_args+0xa1) [0x7f48c8918021]
/usr/lib64/libgtk-x11-2.0.so.0(gtk_init_check+0x9) [0x7f48c8918079]
/usr/lib64/libgtk-x11-2.0.so.0(gtk_init+0x9) [0x7f48c89180a9]
/usr/bin/gedit(main+0x166) [0x427fc6]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f48c5b42b4d]
/usr/bin/gedit() [0x4276c9]
As it appears in this Gtk+ program the rwlock type_rw_lock
(defined in glib's gobject/gtype.c) is a hotspot. GLib's rwlocks are
implemented on top of mutexes, so an obvious attempt in improving this could
be to actually make them use the operating system's rwlock primitives.
If a mutex is used often but only ever by the same thread it cannot starve
other threads. The 'Changed.' column lists how often a specific mutex changed
the owning thread. If the number is high this means the risk of contention is
also high. The 'Cont.' column tells you about contention that actually took
place.
Due to the way mutrace works we cannot profile mutexes that are
used internally in glibc, such as those used for synchronizing stdio
and suchlike.
mutrace is implemented entirely in userspace. It
uses all kinds of exotic GCC, glibc and kernel features, so you might have a
hard time compiling and running it on anything but a very recent Linux
distribution. I have tested it on Rawhide but it should work on slightly older
distributions, too.
Make sure to build your application with -rdynamic to make the
backtraces mutrace generates useful.
As of now, mutrace only profiles mutexes. Adding support for
rwlocks should be easy to add though. Patches welcome.
The output mutrace generates can be influenced by various
MUTRACE_xxx environment variables. See the sources for more
information.
And now, please take mutrace and profile and speed up your application!
You may find the sources in my
git repository.