Re: [PATCH v3 2/3] powerpc: get hugetlbpage handling more generic

From: Scott Wood
Date: Tue Dec 06 2016 - 20:06:57 EST


On Tue, 2016-12-06 at 07:34 +0100, Christophe LEROY wrote:
>
> Le 06/12/2016 Ã 02:18, Scott Wood a Ãcrit :
> >
> > On Wed, 2016-09-21 at 10:11 +0200, Christophe Leroy wrote:
> > >
> > > Today there are two implementations of hugetlbpages which are managed
> > > by exclusive #ifdefs:
> > > * FSL_BOOKE: several directory entries points to the same single
> > > hugepage
> > > * BOOK3S: one upper level directory entry points to a table of hugepages
> > >
> > > In preparation of implementation of hugepage support on the 8xx, we
> > > need a mix of the two above solutions, because the 8xx needs both cases
> > > depending on the size of pages:
> > > * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
> > > that 2 PGD entries will be necessary to cover an 8M hugepage while a
> > > single PGD entry will cover 8x 512k hugepages.
> > > * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
> > > that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
> > > hugepages will be covers by one PGD entry.
> > >
> > > This patch:
> > > * removes #ifdefs in favor of if/else based on the range sizes
> > > * merges the two huge_pte_alloc() functions as they are pretty similar
> > > * merges the two hugetlbpage_init() functions as they are pretty similar
> > [snip]
> > >
> > > @@ -860,16 +803,34 @@ static int __init hugetlbpage_init(void)
> > > Â Â* if we have pdshift and shift value same, we don't
> > > Â Â* use pgt cache for hugepd.
> > > Â Â*/
> > > - if (pdshift != shift) {
> > > + if (pdshift > shift) {
> > > Â pgtable_cache_add(pdshift - shift, NULL);
> > > Â if (!PGT_CACHE(pdshift - shift))
> > > Â panic("hugetlbpage_init(): could not
> > > create
> > > "
> > > Â ÂÂÂÂÂÂ"pgtable cache for %d bit
> > > pagesize\n", shift);
> > > Â }
> > > +#ifdef CONFIG_PPC_FSL_BOOK3E
> > > + else if (!hugepte_cache) {
> > This else never triggers on book3e, because the way this function
> > calculates
> > pdshift is wrong for book3e (it uses PyD_SHIFT instead of
> > HUGEPD_PxD_SHIFT).
> > ÂWe later get OOMs because huge_pte_alloc() calculates pdshift correctly,
> > tries to use hugepte_cache, and fails.
> Ok, I'll check it again, I was expecting it to still work properly onÂ
> book3e, because after applying patch 3 it works properly on the 8xx.

On 8xx you probably happen to have a page size that yields "pdshift <= shift"
even with the incorrect pdshift calculation, causing hugepte_cache to be
allocated. ÂThe smallest hugepage size on 8xx is 512k compared to 4M on fsl-
book3e.

-Scott