Posted on Fr 25 Juli 2008

String Pools

In part 2.4.3 of Ulrich Drepper's excellent How To Write Shared Libraries (which unfortunately is a bit out-of-date these days) Ulrich suggests replacing arrays of constant strings by a single concatenated string plus an index lookup table, to avoid unnecessary relocations during startup of ELF programs. Maintaining this string pool is however troublesome, it is hard to read and difficult to edit. In appendix B Ulrich lists an example C excerpt which contains some code for simplifying the maintaining of such strings pools, after an idea from Bruno Haible. In my opinion however that suggestion is not that much simpler, and requires splitting off the actual strings into a seperate source file. Ugly!

Some Free Software uses string pools to speed up relocation, e.g. GTK+. Some development tools like gperf contain support for string pools.

All solutions for string pool maintaining I could find on the Internet were not exactly beautiful. Either they were completely manual, manual plus a validity checking tool, or very very cumbersome. Googling around I was unable to find a satisfactory tool for this purpose[1].

After Diego Petteno complained about my heavy use of arrays of constant strings in libatasmart I sat down to change the situation, and wrote strpool.c, a simple parser for a very, very minimal subset of C, written in plain ANSI C. It looks for two special comment markers /* %STRINGPOOLSTART% */ and /* %STRINGPOOLSTOP% */, moves all immediate strings between those markers into a common string pool and rewrites the input with the strings replaced by indexes. Code accessing those strings must use the special _P() macro. With these minimal changes to a source file, passing it through strpool.c will automatically rewrite it to a string-poolized version. The nice thing about this is that the necessary changes in the source are minimal, and the code stays compilable with and without passing it through the strpool.c preprocessor.

Here's an example. First the original non-string-poolized version:

static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};

static int main(int argc, char* argv[]) {
	printf("%s\n", table[2]);
	return 1;
}

For later use with strpool.c we change this like this:

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

When passed through strpool.c this will be rewritten as:

/* Saved 3 relocations, saved 0 strings (0 b) due to suffix compression. */
static const char _strpool_[] =
	"waldo\0"
	"uxknurz\0"
	"foobar\0"
	"fubar\0";
#ifndef STRPOOL
#define STRPOOL
#endif
#ifndef _P
#define _P(x) (_strpool_ + ((x) - (const char*) 1))
#endif

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	((const char*) 1),
	((const char*) 7),
	((const char*) 15),
	((const char*) 22)
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

All three versions can be compiled directly with gcc. However, the version that was passed through strpool.c compresses the number of relocations for the table array from 4 to 1. Which isn't much of a difference, but the larger your tables are the more relevant the difference in the number of necessary relocations gets.

A more realistic example is atasmart.c which after being preprocessed with strpool.c looks like this. In this specific example the number of necessary startup relocations goes down from > 100 to 9.

I am note sure if the parser is 100% correct, but it works fine with all sources I tried. It even does suffix compression like gcc does for normal strings too.

Footnotes

[1] Or maybe I just suck in googling? Anyone has a suggestion for such a tool?

© Lennart Poettering. Built using Pelican. Theme by Giulio Fidente on github. .