Re: [Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

From: Jerome Glisse
Date: Fri Jun 22 2018 - 12:25:11 EST


On Fri, Jun 22, 2018 at 04:36:49PM +0100, Chris Wilson wrote:
> Quoting Michal Hocko (2018-06-22 16:02:42)
> > Hi,
> > this is an RFC and not tested at all. I am not very familiar with the
> > mmu notifiers semantics very much so this is a crude attempt to achieve
> > what I need basically. It might be completely wrong but I would like
> > to discuss what would be a better way if that is the case.
> >
> > get_maintainers gave me quite large list of people to CC so I had to trim
> > it down. If you think I have forgot somebody, please let me know
>
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index 854bd51b9478..5285df9331fa 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -112,10 +112,11 @@ static void del_object(struct i915_mmu_object *mo)
> > mo->attached = false;
> > }
> >
> > -static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > +static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > struct mm_struct *mm,
> > unsigned long start,
> > - unsigned long end)
> > + unsigned long end,
> > + bool blockable)
> > {
> > struct i915_mmu_notifier *mn =
> > container_of(_mn, struct i915_mmu_notifier, mn);
> > @@ -124,7 +125,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > LIST_HEAD(cancelled);
> >
> > if (RB_EMPTY_ROOT(&mn->objects.rb_root))
> > - return;
> > + return 0;
>
> The principle wait here is for the HW (even after fixing all the locks
> to be not so coarse, we still have to wait for the HW to finish its
> access). The first pass would be then to not do anything here if
> !blockable.
>
> Jerome keeps on shaking his head and telling us we're doing it all
> wrong, so maybe it'll all fall out of HMM before we have to figure out
> how to differentiate between objects that can be invalidated immediately
> and those that need to acquire locks and/or wait.

Intel and AMD are doing it right nowadays (IIRC AMD had a bug a while
back). What i want is to replace GUP and notifier with HMM, with the
intention that we can mitigate in more clever way thing like OOM or
other mm aspect inside HMM and thus isolating mm folks from ever having
to decipher GPU or other weird drivers :)

I also want to do that for optimization purposes to allow to share
more thing accross multiple GPU that mirror same range of address.

Finaly another motiviation is to avoid the pin GUP implies and only
rely on mmu notification. This would unlock some memory migration
from ever backing of early when they see the pin.

I intend to post patches sometime before XDC this year and discuss
them at XDC see how people on driver side feel about that. I also
want to use that as an excuse to gather features request and other
Santa wishlist for HMM ;)

Cheers,
Jérôme