Re: [tip:numa/core] x86, mm: Prevent gcc to re-read the pagetables

From: Peter Zijlstra
Date: Mon Oct 22 2012 - 07:03:07 EST


On Sun, 2012-10-21 at 05:56 -0700, tip-bot for Andrea Arcangeli wrote:

> In get_user_pages_fast() the TLB shootdown code can clear the pagetables
> before firing any TLB flush (the page can't be freed until the TLB
> flushing IPI has been delivered but the pagetables will be cleared well
> before sending any TLB flushing IPI).

I think we want to do this for all gup_fast() implementations. When I
reported this issue I also proposed adding something like
page_table_deref() which we could use through-out. Not sure we want to,
but at least all archs need an audit for this.


> ---
> arch/x86/mm/gup.c | 23 ++++++++++++++++++++---
> mm/memory.c | 2 +-
> 2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
> index dd74e46..6dc9921 100644
> --- a/arch/x86/mm/gup.c
> +++ b/arch/x86/mm/gup.c
> @@ -150,7 +150,13 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>
> pmdp = pmd_offset(&pud, addr);
> do {
> - pmd_t pmd = *pmdp;
> + /*
> + * With THP and hugetlbfs the pmd can change from
> + * under us and it can be cleared as well by the TLB
> + * shootdown, so read it with ACCESS_ONCE to do all
> + * computations on the same sampling.
> + */
> + pmd_t pmd = ACCESS_ONCE(*pmdp);
>
> next = pmd_addr_end(addr, end);
> /*
> @@ -220,7 +226,13 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
>
> pudp = pud_offset(&pgd, addr);
> do {
> - pud_t pud = *pudp;
> + /*
> + * With hugetlbfs giga pages the pud can change from
> + * under us and it can be cleared as well by the TLB
> + * shootdown, so read it with ACCESS_ONCE to do all
> + * computations on the same sampling.
> + */
> + pud_t pud = ACCESS_ONCE(*pudp);
>
> next = pud_addr_end(addr, end);
> if (pud_none(pud))
> @@ -280,7 +292,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
> local_irq_save(flags);
> pgdp = pgd_offset(mm, addr);
> do {
> - pgd_t pgd = *pgdp;
> + /*
> + * The pgd could be cleared by the TLB shootdown from
> + * under us so read it with ACCESS_ONCE to do all
> + * computations on the same sampling.
> + */
> + pgd_t pgd = ACCESS_ONCE(*pgdp);
>
> next = pgd_addr_end(addr, end);
> if (pgd_none(pgd))
> diff --git a/mm/memory.c b/mm/memory.c
> index cc8e280..c0de477 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3555,7 +3555,7 @@ int handle_pte_fault(struct mm_struct *mm,
> pte_t entry;
> spinlock_t *ptl;
>
> - entry = *pte;
> + entry = ACCESS_ONCE(*pte);
> if (!pte_present(entry)) {
> if (pte_none(entry)) {
> if (vma->vm_ops) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/