Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

From: Linus Torvalds (torvalds@transmeta.com)
Date: Mon Oct 23 2000 - 15:57:22 EST


On Mon, 23 Oct 2000, Petr Vandrovec wrote:
>
> Yes. Bad news. No problem was catched in filemap_nopage, but one
> (of 57000) pages was dirty and had page->mapping == NULL... (maybe
> only one was caused that this was just after bootup, with plenty of memory)
> Maybe I should look at readahead code?

Yes, you're right, read-ahead is in fact even more likely to catch the
race.

> In case of truncate it is going irrevocable away. Accesses after truncate
> should (and sometime give you) SIGBUS...

Yes. But I'd rather have a small race where we under certain strange
circumstances forget to raise a SIGBUS than have a kernel bug that causes
oops and possible filesystem corruption.

So I'm not just looking for a fix, I'm also looking for a SIMPLE fix, with
perhaps some major surgery later (things like making the inode semaphore a
read-write semaphore and just always synchronizing 100% on i_size - which
would fix the problem, but is not a 2.4.x thing, no way Jose).

> First page->mapping == NULL entry in syslog is dated 22:23:58, but
> couple of entries was lost before (probably I should print only '.' for
> each such page; this run there was more than 100 such pages)
> Another question is why SIGCHLD was delivered to parent AFTER ftruncate,
> but exit was invoked couple of seconds before - maybe it syncs
> child address space to disk?

exit() basically does do a msync(MSASYNC), so that could be it.

                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Oct 23 2000 - 21:00:21 EST