Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT
From: Kiryl Shutsemau
Date: Thu Aug 14 2025 - 07:00:21 EST
On Thu, Aug 14, 2025 at 12:14:40AM +0000, Edgecombe, Rick P wrote:
> On Wed, 2025-08-13 at 16:31 -0700, Dave Hansen wrote:
> > On 8/13/25 15:43, Edgecombe, Rick P wrote:
> > > I redid the test. Boot 10 TDs with 16GB of ram, run userspace to fault in memory
> > > from 4 threads until OOM, then shutdown. TDs were split between two sockets. It
> > > ended up with 1136 contentions of the global lock, 4ms waiting.
> >
> > 4ms out of how much CPU time?
>
> The whole test took about 60s wall time (minus the time of some manual steps).
> I'll have to automate it a bit more. But 4ms seemed safely in the "small"
> category.
>
> >
> > Also, contention is *NOT* necessarily bad here. Only _false_ contention.
> >
> > The whole point of the lock is to ensure that there aren't two different
> > CPUs trying to do two different things to the same PAMT range at the
> > same time.
> >
> > If there are, one of them *HAS* to wait. It can wait lots of different
> > ways, but it has to wait. That wait will show up as spinlock contention.
> >
> > Even if the global lock went away, that 4ms of spinning might still be
> > there.
>
> I assumed it was mostly real contention because the the refcount check outside
> the lock should prevent the majority of "two threads operating on the same 2MB
> region" collisions. The code is roughly:
>
> 1:
> if (atomic_inc_not_zero(2mb_pamt_refcount))
> return <it's mapped>;
> 2:
> <global lock>
> if (atomic_read(2mb_pamt_refcount) != 0) {
> 3:
> atomic_inc(2mb_pamt_refcount);
> <global unlock>
> return <it's mapped>;
> }
> <seamcall>
> <global unlock>
> 4:
>
> (similar pattern on the unmapping)
>
> So it will only be valid contention if two threads try to fault in the *same* 2MB
> DPAMT region *and* lose that race around 1-3, but invalid contention if threads try
> to execute 2-4 at the same time for any different 2MB regions.
>
> Let me go verify.
Note that in absence of the global lock here, concurrent PAMT.ADD would
also trigger some cache bouncing during pamt_walk() on taking shared
lock on 1G PAMT entry and exclusive lock on 2M entries in the same
cache (4 PAMT_2M entries per cache line). This is hidden by the global
lock.
You would not recover full contention time by removing the global lock.
--
Kiryl Shutsemau / Kirill A. Shutemov