Re: [PATCHv3 0/5] Fix compound_head() race

From: Kirill A. Shutemov
Date: Thu Aug 20 2015 - 08:31:17 EST


On Wed, Aug 19, 2015 at 12:21:41PM +0300, Kirill A. Shutemov wrote:
> Here's my attempt on fixing recently discovered race in compound_head().
> It should make compound_head() reliable in all contexts.
>
> The patchset is against Linus' tree. Let me know if it need to be rebased
> onto different baseline.
>
> It's expected to have conflicts with my page-flags patchset and probably
> should be applied before it.
>
> v3:
> - Fix build without hugetlb;
> - Drop page->first_page;
> - Update comment for free_compound_page();
> - Use 'unsigned int' for page order;
>
> v2: Per Hugh's suggestion page->compound_head is moved into third double
> word. This way we can avoid memory overhead which v1 had in some
> cases.
>
> This place in struct page is rather overloaded. More testing is
> required to make sure we don't collide with anyone.

Andrew, can we have the patchset applied, if nobody has objections?

It applies cleanly into your patchstack just before my page-flags
patchset.

As expected, it causes few conflicts with patches:

page-flags-introduce-page-flags-policies-wrt-compound-pages.patch
mm-sanitize-page-mapping-for-tail-pages.patch
include-linux-page-flagsh-rename-macros-to-avoid-collisions.patch

Updated patches with solved conflicts are attached.

Let me know if I need to do anything else about this.

Hugh, does it address your worry wrt page-flags?

Before you've mentioned races of whether the head page still agrees with
the tail. I don't think it's an issue: you can get this kind of race only
in very special environments like pfn scanner where you anyway need to
re-validate the page after stabilizing it.

Bloat from my page-flags is also reduced substantially. Size of your
page_is_locked() example in allnoconfig case reduced from 32 to 17 bytes.
With the patchset it look this way:

00003070 <page_is_locked>:
3070: 8b 50 14 mov 0x14(%eax),%edx
3073: f6 c2 01 test $0x1,%dl
3076: 8d 4a ff lea -0x1(%edx),%ecx
3079: 0f 45 c1 cmovne %ecx,%eax
307c: 8b 00 mov (%eax),%eax
307e: 24 01 and $0x1,%al
3080: c3 ret

--
Kirill A. Shutemov