Re: innd mmap bug in 2.4.0-test12

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sun Dec 24 2000 - 12:57:56 EST


On Sun, 24 Dec 2000, Marco d'Itri wrote:
> On Dec 24, Alexander Viro <viro@math.psu.edu> wrote:
>
> >> I put "cp active active.ok" in the rc file before shutting down the
> >> daemon and at the next boot the files are different, every time.
> >
> >Could you send me both files? BTW, which filesystem it is?
> I use ext2. The files are not corrupted, they just are not updated.
> Another data point: at least in some cases, if I stop and start inn
> without rebooting the files are the same.

Ok, looks like we just drop the page cache page without writing it out in
some cases. Possibly/probably because we have dropped the dirty bit on the
floor.

Look slike this is a completely different case from the previous
corruptions, it looks more like a VM issue than a FS thing..

Hmm.. munmap() (and exit()) go through "zap_page_range()", which go
through "free_pte()", which definitely copies the dirty bit to the page
structure.

Hmm.. I wonder if such a dirty page might have been moved to the
"inactive_clean" list some way? It shouldn't really be there, as the page
had users, but if it gets on that list we'd not have tested the dirty bit.

Marco, would you mind changing the test in reclaim_page(), somewheer
around line mm/vmscan.c:487 that says:

        /* The page is dirty, or locked, move to inactive_dirty list. */
        if (page->buffers || TryLockPage(page)) {
                ...

and change the test to

        if (page->buffers || PageDirty(page) || TryLockPage(page)) {

instead? Ie ad the test for "PageDirty(page)" (and order _is_ important:
the TryLockPage() thing must come last, because it has side effects).

(You might add a "printk()" too that triggers when the new condition
happens, just to see if it does indeed happen).

If the page is on the inactive_clean() list, we'll have to find where it
is put there, because it really shouldn't have been there.

Uhhuh. Actually, reading "page_launder()", the buffer clearign case looks
suspiciously like i doesn't check for page accessed or dirty bits. That's
probably it. Maybe there are other cases. Anyway, I'd love to hear if the
above one-liner fixes the corruption for you..

        Thanks,
                Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 31 2000 - 21:00:07 EST