[PATCH] module: Fix performance regression on modules with largesymbol tables

From: Kevin Cernekee
Date: Fri Nov 04 2011 - 21:00:55 EST


Commit 554bdfe5acf3715e87c8d5e25a4f9a896ac9f014 (module: reduce string
table for loaded modules) introduced an optimization to shrink the size of
the resident string table. Part of this involves calling bitmap_weight()
on the strmap bitmap once for each core symbol. strmap contains one bit
for each byte of the module's strtab.

For kernel modules with a large number of symbols, the addition of the
bitmap_weight() operation to each iteration of the add_kallsyms() loop
resulted in a significant "insmod" performance regression from 2.6.31
to 2.6.32. bitmap_weight() is expensive when the bitmap is large.

The proposed alternative optimizes the common case in this loop
(is_core_symbol() == true, and the symbol name is not a duplicate), while
penalizing the exceptional case of a duplicate symbol.

My test was run on a 600 MHz MIPS processor, using a kernel module with
15,000 "core" symbols and 10,000 symbols in .init.text. .strtab takes up
250,227 bytes.

Original code: insmod takes 3.39 seconds
Patched code: insmod takes 0.07 seconds

Signed-off-by: Kevin Cernekee <cernekee@xxxxxxxxx>
---

Since the new code performs an exhaustive string compare search when it
encounters duplicate symbols inside a module (i.e. multiple symtab entries
referring to the same strtab index), I did some extra checking on my
Linux PC to see how common this is:

For modules other than nvidia, there were 35 duplicate symbols out of
9,956 total LKM symbols (0.4%). This is with KALLSYMS and KALLSYMS_ALL
enabled. Many were ".LCx" literal constants, and others were random
duplications of trace_kmalloc(), cache_put(), do_vfs_lock(), etc.
Probably caused by combining multiple *.o files into a single *.ko file.

The nvidia module has 29,296 total entries, and 3,045 duplicates (10%).
There were 597 instances of each of: _nv009058rm, _nv009059rm,
_nv009060rm, and _nv009061rm.

To make sure the degenerate case of nvidia.ko was still covered, I ran
additional tests with qemu-system-arm (ARM Versatile) on Linus' head of
tree:

Latest kernel (commit 15831714), 25,000 symbol test (as above): 4.5s

Latest kernel with 2,400 (16%) of my module's core symbols turned into
duplicates through hex editing: 4.4s

Patched kernel, 25,000 symbol test: 0.1s

Patched kernel, with 2,400 duplicate symbols: 0.8s

So, even a module with large numbers of duplicate symbols loads more
quickly with my patch, than without it.


kernel/module.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 93342d9..7f5dcbf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2221,7 +2221,7 @@ static void layout_symtab(struct module *mod, struct load_info *info)

static void add_kallsyms(struct module *mod, const struct load_info *info)
{
- unsigned int i, ndst;
+ unsigned int i, j, stridx = 1, ndst;
const Elf_Sym *src;
Elf_Sym *dst;
char *s;
@@ -2237,22 +2237,32 @@ static void add_kallsyms(struct module *mod, const struct load_info *info)
mod->symtab[i].st_info = elf_type(&mod->symtab[i], info);

mod->core_symtab = dst = mod->module_core + info->symoffs;
+ mod->core_strtab = s = mod->module_core + info->stroffs;
src = mod->symtab;
*dst = *src;
+ *s++ = 0;
for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
continue;
dst[ndst] = *src;
- dst[ndst].st_name = bitmap_weight(info->strmap,
- dst[ndst].st_name);
+ if (unlikely(!test_bit(src->st_name, info->strmap))) {
+ dst[ndst].st_name = 0;
+ for (j = 1; j < ndst; j++)
+ if (!strcmp(&mod->strtab[src->st_name],
+ &mod->core_strtab[dst[j].st_name]))
+ dst[ndst].st_name = dst[j].st_name;
+ } else {
+ dst[ndst].st_name = stridx;
+ j = src->st_name;
+ clear_bit(j, info->strmap);
+ do {
+ *s = mod->strtab[j++];
+ stridx++;
+ } while (*s++);
+ }
++ndst;
}
mod->core_num_syms = ndst;
-
- mod->core_strtab = s = mod->module_core + info->stroffs;
- for (*s = 0, i = 1; i < info->sechdrs[info->index.str].sh_size; ++i)
- if (test_bit(i, info->strmap))
- *++s = mod->strtab[i];
}
#else
static inline void layout_symtab(struct module *mod, struct load_info *info)
--
1.7.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/