Re: [PATCH] page->flags corruption fix

From: Andrea Arcangeli
Date: Sat Oct 11 2003 - 08:48:35 EST


On Wed, Oct 08, 2003 at 11:59:06AM -0400, Rik van Riel wrote:
> On Wed, 8 Oct 2003, Hugh Dickins wrote:
> > On Wed, 8 Oct 2003 Matt_Domsch@xxxxxxxx wrote:
>
> > > We've seen a similar failure with the RHEL2.1 kernel w/o RMAP patches
> > > too. So we fully believe it's possible in stock 2.4.x.
> >
> > A similar failure - but what exactly?
> > And what is the actual race which would account for it?
> >
> > I don't mind you and Rik fixing bugs!
> > I'd just like to understand the bug before it's fixed.
>
> 1) cpu A adds page P to the swap cache, loading page->flags
> and modifying it locally
>
> 2) a second thread scans a page table entry and sees that
> the page was accessed, so cpu B moves page P to the
> active list
>
> 3) cpu A undoes the PG_inactive -> PG_active bit change,
> corrupting the page->flags of P
>
> The -rmap VM doesn't do anything to this bug, except making
> it easy to trigger due to some side effects.

I believe the more correct fix is to hold the pagemap_lru_lock during
__add_to_page_cache. The race exists between pages with PG_lru set (in
the lru) that are being added to the pagecache/swapcache. Holding both
spinlocks really avoids the race, your patch sounds less obviously safe
(since the race still happens but it's more "controlled") and a single
spinlock should be more efficient than the flood of atomic bitops
anyways. Comments? Hugh?

I also can't see why you care about the page->flags in __free_pages_ok.
by that time, if the page is still visible anywhere that's a _real_
huge bug, so that second part really looks wrong and it should be backed
out IMHO. For sure at that time the page can't be in any lru list, at
least in my tree, see what the correct code is there, this's the only
way to handle that case right (the in_interrupt() part is needed only if
you've aio like in your tree like in my tree). So your changes in
free_pages are definitely superflous in page_alloc.c, and I guess
they're not closing all the bugs in your tree too (you need the below
too, if you allow anon pages to be freed while in the lru still with
asynchronous io).

The first part of your patch I agree fixes the race, but I'd prefer just
taking one more spinlock to avoid the race at all, instead of trying to
control it with a flood of bitops.

static void __free_pages_ok (struct page *page, unsigned int order)
{
unsigned long index, page_idx, mask, flags;
free_area_t *area;
struct page *base;
zone_t *zone;
#ifdef CONFIG_SMP
per_cpu_pages_t *per_cpu_pages;
#endif

/*
* Yes, think what happens when other parts of the kernel take
* a reference to a page in order to pin it for io. -ben
*/
if (PageLRU(page)) {
if (unlikely(in_interrupt())) {
unsigned long flags;

spin_lock_irqsave(&free_pages_ok_no_irq_lock, flags);

page->next_hash = free_pages_ok_no_irq_head;
free_pages_ok_no_irq_head = page;
page->index = order;

spin_unlock_irqrestore(&free_pages_ok_no_irq_lock, flags);

schedule_task(&free_pages_ok_no_irq_task);
return;
}
lru_cache_del(page);
}
[..]

Andrea - If you prefer relying on open source software, check these links:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/