Re: [PATCH 05/31] x86/mm: Reduce tlb flushes fromptep_set_access_flags()

From: Ingo Molnar
Date: Fri Oct 26 2012 - 02:42:07 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Oct 25, 2012 at 8:57 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:
> >
> > That may not even be needed. Apparently Intel chips
> > automatically flush an entry from the TLB when it causes a
> > page fault. I assume AMD chips do the same, because
> > flush_tlb_fix_spurious_fault evaluates to nothing on x86.
>
> Yes. It's not architected as far as I know, though. But I
> agree, it's possible - even likely - we could avoid TLB
> flushing entirely on x86.
>
> If you want to try it, I would seriously suggest you do it as
> a separate commit though, just in case.

Ok, will do it like that. INVLPG overhead is small effect,
nevertheless it's worth trying.

What *has* shown up in my profiles though, and which drove some
of these changes is that for heavily threaded VM-intense
workloads such as a single SPECjbb JVM instance running on all
CPUs and all nodes, TLB flushes with any sort of serialization
aspect are absolutely deadly.

So just to be *able* to verify the performance benefit and
impact of some of the later NUMA-directed changes, we had to
eliminate a number of scalability bottlenecks and put these
optimization patches in front of the main changes.

That is why you have to go 20+ patches into the queue to see the
real point :-/

> > Are there architectures where we do need to flush remote
> > TLBs on upgrading the permissions on a PTE?
>
> I *suspect* that whole TLB flush just magically became an SMP
> one without anybody ever really thinking about it.

Yeah, and I think part of the problem is that it's also a not
particularly straightforward to analyze performance bottleneck:
SMP TLB flushing does not show up as visible high overhead in
profiles mainly, it mostly shows up as extra idle time.

If the nature of the workload is that it has extra available
paralellism that can fill in the idle time, it will mask much of
the effect and there's only a slight shift in the profile.

It needs a borderline loaded system and sleep profiling to
pinpoint these sources of overhead.

[...]
> > From reading the code again, it looks like things should
> > indeed work ok.
>
> I would be open to it, but just in case it causes bisectable
> problems I'd really want to see it in two patches ("make it
> always do the local flush" followed by "remove even the local
> flush"), and then it would pinpoint any need.

Yeah, 100% agreed.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/