Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

From: Linus Torvalds
Date: Mon Apr 25 2011 - 11:30:11 EST


On Mon, Apr 25, 2011 at 2:17 AM, Bruno Prémont
<bonbons@xxxxxxxxxxxxxxxxx> wrote:
>
> Here it seems to happened when I run 2 intensive tasks in parallel, e.g.
> (re)emerging gimp and running revdep-rebuild -pi in another terminal.
> This produces a fork rate of about 100-300 per second.
>
> Suddenly kmalloc-128 slabs stop being freed and things degrade.

So everything seems to imply some kind of filesystem/vfs thing, but
let's try to gather a bit more information about exactly what it is.

Some of it also points to RCU freeing, but that "kmalloc-128" doesn't
really match my expectations. According to your slabinfo, it's not the
dentries.

One thing I'd ask you to do is to boot with the "slub_nomerge" kernel
command line switch. The SLUB "merge slab caches" thing may save some
memory, but it has been a disaster from every other standpoint - every
time there's a memory leak, it ends up making it very confusing to try
to figure things out.

For example, your traces seem to imply that the kmalloc-128 allocation
is actually the "filp" cache, but it has gotten merged with the
kmalloc-128 cache, so slabinfo doesn't actually show the right user.

(Pekka? This is a real _problem_. The whole "confused debugging" is
wasting a lot of peoples time. Can we please try to get slabinfo
statistics work right for the merged state. Or perhaps decide to just
not merge at all?)

As to why it has started to happen now: with the whole RCU lookup
thing, many more filesystem objects are RCU-free'd (dentries have been
for a long time, but now we have inodes and filp's too), and that may
end up delaying allocations sufficiently that you end up seeing
something that used to be borderline become a major problem.

Also, what's your kernel config, in particular wrt RCU? The RCU
freeing _should_ be self-limiting (if I recall correctly) and not let
infinite amounts of RCU work (ie pending freeing) accumulate, but
maybe something is broken. Do you have a UP kernel with TINY_RCU, for
example? Or maybe I'm just confused, and there's never any RCU
throttling at all. Paul?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/