Re: Re: [PATCH v6 02/14] mm/damon: Implement region based sampling

From: SeongJae Park
Date: Thu Mar 12 2020 - 05:21:17 EST


On Tue, 10 Mar 2020 17:39:38 +0000 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

> On Tue, 10 Mar 2020 17:22:40 +0100
> SeongJae Park <sjpark@xxxxxxxxxx> wrote:
>
> > On Tue, 10 Mar 2020 15:55:10 +0000 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> >
> > > On Tue, 10 Mar 2020 12:52:33 +0100
> > > SeongJae Park <sjpark@xxxxxxxxxx> wrote:
> > >
> > > > Added replies to your every comment in line below. I agree to your whole
> > > > opinions, will apply those in next spin! :)
> > > >
> > >
> > > One additional question inline that came to mind. Using a single statistic
> > > to monitor huge page and normal page hits is going to give us problems
> > > I think.
> >
> > Ah, you're right!!! This is indeed a critical bug!
> >
> > >
> > > Perhaps I'm missing something?
> > >
> > > > > > +/*
> > > > > > + * Check whether the given region has accessed since the last check
> > > > >
> > > > > Should also make clear that this sets us up for the next access check at
> > > > > a different memory address it the region.
> > > > >
> > > > > Given the lack of connection between activities perhaps just split this into
> > > > > two functions that are always called next to each other.
> > > >
> > > > Will make the description more clearer as suggested.
> > > >
> > > > Also, I found that I'm not clearing *pte and *pmd before going 'mkold', thanks
> > > > to this comment. Will fix it, either.
> > > >
> > > > >
> > > > > > + *
> > > > > > + * mm 'mm_struct' for the given virtual address space
> > > > > > + * r the region to be checked
> > > > > > + */
> > > > > > +static void kdamond_check_access(struct damon_ctx *ctx,
> > > > > > + struct mm_struct *mm, struct damon_region *r)
> > > > > > +{
> > > > > > + pte_t *pte = NULL;
> > > > > > + pmd_t *pmd = NULL;
> > > > > > + spinlock_t *ptl;
> > > > > > +
> > > > > > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
> > > > > > + goto mkold;
> > > > > > +
> > > > > > + /* Read the page table access bit of the page */
> > > > > > + if (pte && pte_young(*pte))
> > > > > > + r->nr_accesses++;
> > > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > >
> > > > > Is it worth having this protection? Seems likely to have only a very small
> > > > > influence on performance and makes it a little harder to reason about the code.
> > > >
> > > > It was necessary for addressing 'implicit declaration' problem of 'pmd_young()'
> > > > and 'pmd_mkold()' for build of DAMON on several architectures including User
> > > > Mode Linux.
> > > >
> > > > Will modularize the code for better readability.
> > > >
> > > > >
> > > > > > + else if (pmd && pmd_young(*pmd))
> > > > > > + r->nr_accesses++;
> > >
> > > So we increment a region count by one if we have an access in a huge page, or
> > > in a normal page.
> > >
> > > If we get a region that has a mixture of the two, this seems likely to give a
> > > bad approximation.
> > >
> > > Assume the region is accessed 'evenly' but each " 4k page" is only hit 10% of the time
> > > (where a hit is in one check period)
> > >
> > > If our address in a page, then we'll hit 10% of the time, but if it is in a 2M
> > > huge page then we'll hit a much higher percentage of the time.
> > > 1 - (0.9^512) ~= 1
> > >
> > > Should we look to somehow account for this?
> >
> > Yes, this is really critical bug and we should fix this! Thank you so much for
> > finding this!
> >
> > >
> > > > > > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > > > > > +
> > > > > > + spin_unlock(ptl);
> > > > > > +
> > > > > > +mkold:
> > > > > > + /* mkold next target */
> > > > > > + r->sampling_addr = damon_rand(ctx, r->vm_start, r->vm_end);
> > > > > > +
> > > > > > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
> > > > > > + return;
> > > > > > +
> > > > > > + if (pte) {
> > > > > > + if (pte_young(*pte)) {
> > > > > > + clear_page_idle(pte_page(*pte));
> > > > > > + set_page_young(pte_page(*pte));
> > > > > > + }
> > > > > > + *pte = pte_mkold(*pte);
> > > > > > + }
> > > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > > > + else if (pmd) {
> > > > > > + if (pmd_young(*pmd)) {
> > > > > > + clear_page_idle(pmd_page(*pmd));
> > > > > > + set_page_young(pmd_page(*pmd));
> > > > > > + }
> > > > > > + *pmd = pmd_mkold(*pmd);
> > > > > > + }
> >
> > This is also very problematic if several regions are backed by a single huge
> > page, as only one region in the huge page will be checked as accessed.
> >
> > Will address these problems in next spin!
>
> Good point. There is little point in ever having multiple regions including
> a single huge page. Would it be possible to tweak the region splitting algorithm
> to not do this?

Yes, it would be a good solution. However, I believe this is a problem of the
access checking mechanism, as the definition of the region is only 'memory
area having similar access frequency'. Adding more rules such as 'it should
be aligned by HUGE PAGE size' might make things more complex. Also, we're
currently using page table Accessed bits as the primitive for the access check,
but it could be extended to other primitives in future. Therefore, I would
like to modify the access checking mechanism to aware the huge pages
existance.

For regions containing both regular pages and huge pages, the huge pages will
make some errorneous high access frequency as you noted before, but the
adaptive regions adjustment will eventually split them.

If you have other concerns or opinions, please let me know.


Thanks,
SeongJae Park

>
> Jonathan
>
> >
> >
> > Thanks,
> > SeongJae Park
> >
> > > > > > +#endif
> > > > > > +
> > > > > > + spin_unlock(ptl);
> > > > > > +}
> > > > > > +
> > >
> > >
>
>