Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (inprocess 'aplay', page c18eef30)

From: Hugh Dickins
Date: Mon Nov 21 2005 - 10:46:22 EST


On Mon, 21 Nov 2005, Takashi Iwai wrote:
>
> Sorry, I still don't figure out how __GFP_COMP solved this problem.
> Could you enlighten me a bit?

The sequence of problems was this.

Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2)
changed VM_RESERVED vmas never to free their pages on unmapping (e.g.
on exit) - fine for remap_pfn_range areas, but a leak where others set
VM_RESERVED; and PageReserved not to inhibit decrementing page count.

In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its
previous behaviour, and using a different flag VM_UNPAGED, set in
remap_pfn_range, for the don't-free-when-unmapping behaviour.

But there's then a problem when the underlying page was allocated
as a high-order page, but the separate individual 0-order constituent
pages are mapped into userspace by nopage: the page count of the first
0-order is raised by allocation, but the following 0-order pages are
left with page count 0. nopage's get_page raises that to 1,
zap_pte_range (or whatever it uses to actually do the freeing) lowers
that to 0, and hence frees the page, even though it's a constituent of
the not-yet-freed high-order page. (This had not been a problem while
PageReserved was inhibiting decrementing the page count.)

So another of my patches in -rc1-mm2 made the PageCompound technique
available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
get_page and put_page on the later constituents of the high-order
page get redirected to the first one, and it should work okay again.

Except that I'd missed that you actually have to choose to have your
high-order pages supplied as compound pages, by passing __GFP_COMP.
Since I wasn't passing that, they still weren't allocated as compound
pages, so were still being freed too soon - and the PG_reserved flag
found while freeing gave rise to the "Bad page state" messages seen.

> Isn't it needed for dma_alloc_coherent() (for i386, particularly),
> too? dma_alloc_coherent() also gets pages with __get_free_pages().

Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages?
And (in a separate patch run past davem and wli first, to be aggregated
with the sound/core/memalloc patch when I sign off and send to Andrew)
in the sparc and sparc64 sbus_alloc_consistent.

It's only an issue when the high-order page might be mapped into
userspace, then its constituents freed up by zap_pte_range; or
locked down with get_user_pages and later released: when constituents
of a high-order page pass through common code designed for 0-order pages.

> Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
> now. It was there just for mmap...

That is so, but we'd prefer you to hold off for now. The way it's
proceeding is, for 2.6.15 we're not actually removing the Set/Clear
PageReserved from any of the drivers or from any of the architecture
initialization; but PageReserved is no longer serving any functional
purpose, except where PG_reserved acts as a double-check when the page
is freed, as to whether it's all working right. Which was useful in
this case, to identify where I'd forgotten all about __GFP_COMP; and
I fear may prove useful in some other cases too. Retaining this use
of PG_reserved for now, gives greater confidence in our safety when we
later advance to removing all the Set/ClearPageReserved hocus-pocus.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/