Re: [PATCH 06/36] HMM: add HMM page table v2.

From: Jerome Glisse
Date: Mon Jun 29 2015 - 10:43:27 EST


On Fri, Jun 26, 2015 at 06:34:16PM -0700, Mark Hairgrove wrote:
>
>
> On Fri, 26 Jun 2015, Jerome Glisse wrote:
>
> > On Thu, Jun 25, 2015 at 03:57:29PM -0700, Mark Hairgrove wrote:
> > > On Thu, 21 May 2015, j.glisse@xxxxxxxxx wrote:
> > > > From: Jérôme Glisse <jglisse@xxxxxxxxxx>
> > > > [...]
> > > > +
> > > > +void hmm_pt_iter_init(struct hmm_pt_iter *iter);
> > > > +void hmm_pt_iter_fini(struct hmm_pt_iter *iter, struct hmm_pt *pt);
> > > > +unsigned long hmm_pt_iter_next(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr,
> > > > + unsigned long end);
> > > > +dma_addr_t *hmm_pt_iter_update(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr);
> > > > +dma_addr_t *hmm_pt_iter_fault(struct hmm_pt_iter *iter,
> > > > + struct hmm_pt *pt,
> > > > + unsigned long addr);
> > >
> > > I've got a few more thoughts on hmm_pt_iter after looking at some of the
> > > later patches. I think I've convinced myself that this patch functionally
> > > works as-is, but I've got some suggestions and questions about the design.
> > >
> > > Right now there are these three major functions:
> > >
> > > 1) hmm_pt_iter_update(addr)
> > > - Returns the hmm_pte * for addr, or NULL if none exists.
> > >
> > > 2) hmm_pt_iter_fault(addr)
> > > - Returns the hmm_pte * for addr, allocating a new one if none exists.
> > >
> > > 3) hmm_pt_iter_next(addr, end)
> > > - Returns the next possibly-valid address. The caller must use
> > > hmm_pt_iter_update to check if there really is an hmm_pte there.
> > >
> > > In my view, there are two sources of confusion here:
> > > - Naming. "update" shares a name with the HMM mirror callback, and it also
> > > implies that the page tables are "updated" as a result of the call.
> > > "fault" likewise implies that the function handles a fault in some way.
> > > Neither of these implications are true.
> >
> > Maybe hmm_pt_iter_walk & hmm_pt_iter_populate are better name ?
>
> hmm_pt_iter_populate sounds good. See below for _walk.
>
>
> >
> >
> > > - hmm_pt_iter_next and hmm_pt_iter_update have some overlapping
> > > functionality when compared to traditional iterators, requiring the
> > > callers to all do this sort of thing:
> > >
> > > hmm_pte = hmm_pt_iter_update(&iter, &mirror->pt, addr);
> > > if (!hmm_pte) {
> > > addr = hmm_pt_iter_next(&iter, &mirror->pt,
> > > addr, event->end);
> > > continue;
> > > }
> > >
> > > Wouldn't it be more efficient and simpler to have _next do all the
> > > iteration internally so it always returns the next valid entry? Then you
> > > could combine _update and _next into a single function, something along
> > > these lines (which also addresses the naming concern):
> > >
> > > void hmm_pt_iter_init(iter, pt, start, end);
> > > unsigned long hmm_pt_iter_next(iter, hmm_pte *);
> > > unsigned long hmm_pt_iter_next_alloc(iter, hmm_pte *);
> > >
> > > hmm_pt_iter_next would return the address and ptep of the next valid
> > > entry, taking the place of the existing _update and _next functions.
> > > hmm_pt_iter_next_alloc takes the place of _fault.
> > >
> > > Also, since the _next functions don't take in an address, the iterator
> > > doesn't have to handle the input addr being different from iter->cur.
> >
> > It would still need to do the same kind of test, this test is really to
> > know when you switch from one directory to the next and to drop and take
> > reference accordingly.
>
> But all of the directory references are already hidden entirely in the
> iterator _update function. The caller only has to worry about taking
> references on the bottom level, so I don't understand why the iterator
> needs to return to the caller when it hits the end of a directory. Or for
> that matter, why it returns every possible index within a directory to the
> caller whether that index is valid or not.

Iterator is what protect against concurrent freeing of the directory so it
has to return to caller on directory boundary (for 64bits arch with 64bits
pte it has return every 512 entries). Otherwise pt_iter_fini() would have
to walk over the whole directory range again just to drop reference and this
doesn't sound like a good idea.

So really with what you are asking it whould be:

hmm_pt_iter_init(&iter, start, end);
for(next=pt_iter_next(&iter,&ptep); next<end; next=pt_iter_next(&iter,&ptep))
{
// Here ptep is valid until next address. Above you have to call
// pt_iter_next() to switch to next directory.
addr = max(start, next - (~HMM_PMD_MASK + 1));
for (; addr < next; addr += PAGE_SIZE, ptep++) {
// access ptep
}
}

My point is that internally pt_iter_next() will do the exact same test it is
doing now btw cur and addr. Just that the addr is no longer explicit but iter
infer it.

> If _next only returned to the caller when it hit a valid hmm_pte (or end),
> then only one function would be needed (_next) instead of two
> (_update/_walk and _next).

On the valid entry side, this is because when you are walking the page table
you have no garanty that the entry will not be clear below you (in case of
concurrent invalidation). The only garanty you have is that if you are able
to read a valid entry from the update() callback then this entry is valid
until you get a new update() callback telling you otherwise.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/