Re: Oops in 3.7-rc8 isolate_free_pages_block()

From: Henrik Rydberg
Date: Thu Dec 06 2012 - 11:56:12 EST


Hi Mel,

> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned. Henrik, when
> did this start happening? I would be a little surprised if it started between
> 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason.

I started using transparent hugepages when moving to 3.7-rc1, so it is
quite possible that the problem was there already in 3.6.

> How reproducible is this? Is there anything in particular you do to
> trigger the oops?

Unfortunately nothing special, and it is rare. IIRC, it has happened
after a long uptime, but I guess that only means the probability of
the oops is higher then.

> Does the following patch help any? It's only compile tested I'm afraid.
>
> ---8<---
> mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free
>
> Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
> MAX_ORDER_NR_PAGES block during isolation for migration) added a check
> for pfn_valid() when isolating pages for migration as the scanner does
> not necessarily start pageblock-aligned. However, the free scanner has
> the same problem. If it encounters a hole, it can also trigger an oops
> when is calls PageBuddy(page) on a page that is within an hole.
>
> Reported-by: Henrik Rydberg <rydberg@xxxxxxxxxxx>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> mm/compaction.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9eef558..7d85ad485 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> continue;
> if (!valid_page)
> valid_page = page;
> +
> + /*
> + * As blockpfn may not start aligned, blockpfn->end_pfn
> + * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid
> + * check is necessary. If the pfn is not valid, stop
> + * isolation.
> + */
> + if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 &&
> + !pfn_valid(blockpfn))
> + break;
> if (!PageBuddy(page))
> continue;
>

I am running with it now, adding a printout to see if the case happens
at all. Might take a while, will try to stress the machine a bit.

Thanks,
Henrik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/