Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

From: Fenghua Yu
Date: Tue Sep 28 2021 - 12:44:36 EST


Hi, Thomas,

On Sun, Sep 26, 2021 at 01:13:50AM +0200, Thomas Gleixner wrote:
> Fenghua,
>
> On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote:
> > On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
> >> But OTOH why do you need a per task reference count on the PASID at all?
> >>
> >> The PASID is fundamentaly tied to the mm and the mm can't go away before
> >> the threads have gone away unless this magically changed after I checked
> >> that ~20 years ago.
> >
> > There are up to 1M PASIDs because PASID is 20-bit. I think there are a few ways
> > to allocate and free PASID:
> >
> > 1. Statically allocate a PASID once a mm is created and free it in mm
> > exit. No PASID allocation/free during the mm's lifetime. Then
> > up to 1M processes can be created due to 1M PASIDs limitation.
> > We don't want this method because the 1M processes limitation.
>
> I'm not so worried about the 1M limitation, but it obviously makes sense
> to avoid that because allocating stuff which is not used is pointless in
> general.
>
> > 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
> > are three ways to free it:
> > (a) Actively free it in close(fd)->unbind(dev, mm) by sending
> > IPIs to tell all tasks using the PASID to clear the IA32_PASID
> > MSR. This has locking issues similar to the actively loading
> > IA32_PASID MSR which was force disabled in upstream. So won't work.
>
> Exactly.
>
> > (b) Passively free the PASID in destroy_context(mm) in mm exit. Once
> > the PASID is allocated, it stays with the process for the lifetime. It's
> > better than #1 because the PASID is allocated only on demand.
>
> Which is simple and makes a lot of sense. See below.
>
> > (c) Passively free the PASID in deactive_mm(mm) or unbind() whenever there
> > is no usage as implemented in this series. Tracking the PASID usage
> > per task provides a chance to free the PASID on task exit. The
> > PASID has a better chance to be freed earlier than mm exit in #(b).
> >
> > This series uses #2 and #(c) to allocate and free the PASID for a better
> > chance to ease the 1M PASIDs limitation pressure. For example, a thread
> > doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
> > its sibling threads are still running.
>
> I'm not seeing that as a realistic problem. Applications which use this
> kind of devices are unlikely to behave exactly that way.
>
> 2^20 PASIDs are really plenty and just adding code for the theoretical
> case of PASID pressure is a pointless exercise IMO. It just adds
> complexity for no reason.
>
> IMO reality will be that either you have long lived processes with tons
> of threads which use such devices over and over or short lived forked
> processes which open the device, do the job, close and exit. Both
> scenarios are fine with allocate on first use and drop on process exit.
>
> I think with your approach you create overhead for applications which
> use thread pools where the threads get work thrown at them and do open()
> -> do_stuff() -> close() and then go back to wait for the next job which
> will do exactly the same thing. So you add the overhead of refcounts in
> general and in the worst case if the refcount drops to zero then the
> next worker has to allocate a new PASID instead of just moving on.
>
> So unless you have a really compelling real world usecase argument, I'm
> arguing that the PASID pressure problem is a purely academic exercise.
>
> I think you are conflating two things here:
>
> 1) PASID lifetime
> 2) PASID MSR overhead
>
> Which is not correct: You still can and have to optimize the per thread
> behaviour vs. the PASID MSR: Track per thread whether it ever needed the
> PASID and act upon that.
>
> If the thread just does EMQCMD once in it's lifetime, then so be
> it. That's not a realistic use case, really.
>
> And if someone does this then this does not mean we have to optimize for
> that. Optimizing for possible stupid implementations is the wrong
> approach. There is no technial measure against stupidity. If that would
> exist the world would be a much better place.
>
> You really have to think about the problem space you are working
> on. There are problems which need a 'get it right at the first shot'
> solution because they create user space ABI or otheer hard to fix
> dependencies.
>
> That's absolutely not the case here.
>
> Get the basic simple support correct and work from there. Trying to
> solve all possible theoretical problems upfront is simply not possible
> and a guarantee for not making progress.
>
> "Keep it simple" and "correctness first" are still the best working
> engineering principles.
>
> They do not prevent us from revisiting this _if_ there is a real world
> problem which makes enough sense to implement a finer grained solution.

Sure. Will free the PASID in destroy_context() on mm exit and won't track
the PASID usage per task. The code will be simpler and clearer.

Thank you very much for your insight!

-Fenghua