[PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks

From: Naoya Horiguchi
Date: Mon Mar 03 2014 - 00:02:50 EST


Hi Sasha,

> I can confirm that with this patch the lockdep issue is gone. However, the NULL deref in
> walk_pte_range() and the BUG at mm/hugemem.c:3580 still appear.

I spotted the cause of this problem.
Could you try testing if this patch fixes it?

Thanks,
Naoya
---
Page table walker doesn't check non-present hugetlb entry in common path,
so hugetlb_entry() callbacks must check it. The reason for this behavior
is that some callers want to handle it in its own way.

However, some callers don't check it now, which causes unpredictable result,
for example when we have a race between migrating hugepage and reading
/proc/pid/numa_maps. This patch fixes it by adding pte_present checks on
buggy callbacks.

This bug exists for long and got visible by introducing hugepage migration.

Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # 3.12+
---
fs/proc/task_mmu.c | 3 +++
mm/mempolicy.c | 6 +++++-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git next-20140228.orig/fs/proc/task_mmu.c next-20140228/fs/proc/task_mmu.c
index 3746d89c768b..a4cecadce867 100644
--- next-20140228.orig/fs/proc/task_mmu.c
+++ next-20140228/fs/proc/task_mmu.c
@@ -1299,6 +1299,9 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long addr,
if (pte_none(*pte))
return 0;

+ if (pte_present(*pte))
+ return 0;
+
page = pte_page(*pte);
if (!page)
return 0;
diff --git next-20140228.orig/mm/mempolicy.c next-20140228/mm/mempolicy.c
index c0d1cbd68790..1e171186ee6d 100644
--- next-20140228.orig/mm/mempolicy.c
+++ next-20140228/mm/mempolicy.c
@@ -524,8 +524,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long addr,
unsigned long flags = qp->flags;
int nid;
struct page *page;
+ pte_t entry;

- page = pte_page(huge_ptep_get(pte));
+ entry = huge_ptep_get(pte);
+ if (pte_present(entry))
+ return 0;
+ page = pte_page(entry);
nid = page_to_nid(page);
if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
return 0;
--
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/