Re: OOM problems on 2.6.12-rc1 with many fsx tests

From: Andrew Morton
Date: Mon Apr 04 2005 - 15:43:06 EST


"Martin J. Bligh" <mbligh@xxxxxxxxxxx> wrote:
>
> >> > > I run into OOM problem again on 2.6.12-rc1. I run some(20) fsx tests on
> >> > > 2.6.12-rc1 kernel(and 2.6.11-mm4) on ext3 filesystem, after about 10
> >> > > hours the system hit OOM, and OOM keep killing processes one by one. I
> >> > > could reproduce this problem very constantly on a 2 way PIII 700MHZ with
> >> > > 512MB RAM. Also the problem could be reproduced on running the same test
> >> > > on reiser fs.
> >> > >
> >> > > The fsx command is:
> >> > >
> >> > > ./fsx -c 10 -n -r 4096 -w 4096 /mnt/test/foo1 &
> >> >
> >> >
> >> > This ext3 bug goes all the way back to 2.6.6.
> >>
> >> > I don't know yet why you saw problems with reiser3 and I'm pretty sure I
> >> > saw problems with ext2. More testing is needed there.
> >> >
> >>
> >> We (Janet and I) are chasing this bug as well. Janet is able to
> >> reproduce this bug on 2.6.9 but I can't. Glad to know you have nail down
> >> this issue on ext3. I am pretty sure I saw this on Reiser3 once, I will
> >> double check it. Will try your patch today.
> >
> > There's a second leak, with similar-looking symptoms. At ~50
> > commits/second it has leaked ~10MB in 24 hours, so it's very slow - less
> > than a hundredth the rate of the first one.
>
> What are you using to see these with, just kgdb, and a large cranial
> capacity? Or is there some more magic?
>

Nothing magical: run the test for a while, kill everything, cause a huge
swapstorm then look at the meminfo numbers. If active+inactive is
significantly larger than cahed+buffers+swapcached+mapped+minus-a-bit then
it's leaked.

Right now I have:

MemTotal: 246264 kB
MemFree: 196148 kB
Buffers: 4200 kB
Cached: 3308 kB
SwapCached: 8064 kB
Active: 21548 kB
Inactive: 12532 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 246264 kB
LowFree: 196148 kB
SwapTotal: 1020116 kB
SwapFree: 1001612 kB
Dirty: 60 kB
Writeback: 0 kB
Mapped: 2284 kB
Slab: 12200 kB
CommitLimit: 1143248 kB
Committed_AS: 34004 kB
PageTables: 1200 kB
VmallocTotal: 774136 kB
VmallocUsed: 82832 kB
VmallocChunk: 691188 kB
HugePages_Total: 0
HugePages_Free: 0

33 megs on the LRU, unaccounted for in other places.

Once the leak is nice and large I can start a new swapstorm, set a
breakpoint in try_to_free_buffers() (for example) and start looking at the
state of the page and its buffers.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/