Re: [PATCH v2 2/2] SLUB: Mark merged slab caches in /proc/slabinfo

From: David Rientjes
Date: Wed Sep 15 2010 - 18:53:19 EST


On Wed, 15 Sep 2010, Ted Ts'o wrote:

> All I can say is I hope the merging code is intelligent. We recently
> had a problem where we were wasting huge amounts of memory because we
> were allocating large numbers of a the ext4_group_info structure,
> which was 132 bytes, and for which kmalloc() used a size-256 slab ---
> and the wasted memory was enough to cause OOM's in a critical
> (unfortunately statically sized) container when the disks got large
> enough and numerous enough. The fix was to use a separate cache just
> for these 132-byte objects, and not to use kmalloc().
>

That's not cache merging and it wasn't with slub. kmalloc() allocates
from caches that are initialized at boot with the smallest power-of-two
size that allows the object with alignment to fit (and we have special
96-byte and 192-byte kmalloc caches because they tend to be popular). So
with slub, a kmalloc(132, ...) would allocate from kmalloc-192 instead.

Cache merging merges caches created with kmem_cache_create() with already
existing caches, perhaps even those kmalloc caches, that have the same
basic properties. There's some pretty strict requirements if a cache may
be merged or not: it's alignment must be compatible, and the size must not
waste more than 8 bytes on 64-bit. Debugging flags and things like
SLAB_DESTORY_BY_RCU won't be merged, either.

> I would be really annoyed if we switched to a slab allocator which did
> merging, and then found that the said slab allocator helpfully merged
> the 132-byte slab cache and the size-256 slab into a single slab
> cache, on the grounds that it thought it would save memory... (I
> guess I'm just really really nervous about merging happening behind my
> back, and I really like having the per-object type allocation
> statistics.)
>

Slub would allocate kmalloc(132, ...) from kmalloc-192, and it wouldn't
merge your new cache created for ext4_group_info with any other cache
unless it shared the same flags and had a size of 132-140 bytes with a
compatible alignment. On my system, it looks likely that such a cache
would get merged with the numa_policy cache.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/