In part 2.4.3 of Ulrich Drepper's excellent How To Write Shared
Libraries (which unfortunately is a bit out-of-date these days) Ulrich
suggests replacing arrays of constant strings by a single concatenated string
plus an index lookup table, to avoid unnecessary relocations during startup of
ELF programs. Maintaining this string pool is however troublesome,
it is hard to read and difficult to edit. In appendix B Ulrich
lists an example C excerpt which contains some code for simplifying the
maintaining of such strings pools, after an idea from Bruno Haible. In my
opinion however that suggestion is not that much simpler, and requires
splitting off the actual strings into a seperate source file. Ugly!
Some Free Software uses string pools to speed up relocation, e.g. GTK+.
Some development tools like gperf
contain support for string pools.
All solutions for string pool maintaining I could find on the Internet were not exactly
beautiful. Either they were completely manual, manual plus a validity checking
tool, or very very cumbersome. Googling around I was unable to find a satisfactory tool for this purpose[1].
After Diego Petteno complained about
my heavy use of arrays of constant strings in libatasmart I sat down to
change the situation, and wrote strpool.c,
a simple parser for a very, very minimal subset of C, written in plain ANSI C.
It looks for two special comment markers /* %STRINGPOOLSTART% */ and
/* %STRINGPOOLSTOP% */, moves all immediate strings between those
markers into a common string pool and rewrites the input with the strings
replaced by indexes. Code accessing those strings must use the
special _P() macro. With these minimal changes to a
source file, passing it through strpool.c will automatically rewrite
it to a string-poolized version. The nice thing about this is that the
necessary changes in the source are minimal, and the code stays compilable with
and without passing it through the strpool.c preprocessor.
Here's an example. First the original non-string-poolized version:
static const char* const table[] = {
"waldo",
"uxknurz",
"foobar",
"fubar"
};
static int main(int argc, char* argv[]) {
printf("%s\n", table[2]);
return 1;
}
For later use with strpool.c we change this like this:
#ifndef STRPOOL
#define _P(x) x
#endif
/* %STRINGPOOLSTART% */
static const char* const table[] = {
"waldo",
"uxknurz",
"foobar",
"fubar"
};
/* %STRINGPOOLSTOP% */
static int main(int argc, char* argv[]) {
printf("%s\n", _P(table[2]));
return 1;
}
When passed through strpool.c this will be rewritten as:
/* Saved 3 relocations, saved 0 strings (0 b) due to suffix compression. */
static const char _strpool_[] =
"waldo\0"
"uxknurz\0"
"foobar\0"
"fubar\0";
#ifndef STRPOOL
#define STRPOOL
#endif
#ifndef _P
#define _P(x) (_strpool_ + ((x) - (const char*) 1))
#endif
#ifndef STRPOOL
#define _P(x) x
#endif
/* %STRINGPOOLSTART% */
static const char* const table[] = {
((const char*) 1),
((const char*) 7),
((const char*) 15),
((const char*) 22)
};
/* %STRINGPOOLSTOP% */
static int main(int argc, char* argv[]) {
printf("%s\n", _P(table[2]));
return 1;
}
All three versions can be compiled directly with gcc. However, the version
that was passed through strpool.c compresses the number of
relocations for the table array from 4 to 1. Which isn't much of a
difference, but the larger your tables are the more relevant the difference in
the number of necessary relocations gets.
A more realistic example is atasmart.c which after being preprocessed with strpool.c looks like this. In this specific example the number of necessary startup relocations goes down from > 100 to 9.
I am note sure if the parser is 100% correct, but it works fine with all sources I tried. It even does suffix compression like gcc does for normal strings too.
Footnotes
[1] Or maybe I just suck in googling? Anyone has a suggestion for such a tool?