Re: [patch] my latest oom stuff

MOLNAR Ingo (mingo@chiara.csoma.elte.hu)
Sun, 25 Oct 1998 14:05:27 +0100 (CET)


On Sat, 24 Oct 1998, Linus Torvalds wrote:

> But I suspect that the REAL bug is that there may be code-paths that busy
> loop forever if they get NULL from __get_free_pages(). That's bad. We
> found and fixed one in the TCP code earlier [...]

there is one more, not that i think it makes a difference in this case,
but i'd better put it into the public domain before someone gets bitten by
it: the RAID1 code has some critical places where it cannot allow a NULL
pointer, thus it does an infinit looping for a free page. The reason for
it is quite obscure: at those places we could exit only with an IO error
(which in turn makes some other code think that the disk failed), which i
didnt consider to be the right answer to an out-of-memory situation. It's
a bandaid and is marked by a big FIXME. I typically test all that stuff
under heavy memory load so that i'm sure it doesnt cause any trivial
lockup, but deadlocks are seldom trivial ...

RAID5 is already fixed, it does preallocation so we can always atomically
allocate stuff for a while IO transaction, but RAID1 isnt yet ...
[RAID0/LINEAR is not affected.]

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/