Re: 4.13..4.14 scheduling overhead regression (bisected - b956575bed91)

From: Andy Lutomirski
Date: Fri Jun 01 2018 - 11:09:43 EST


On Fri, Jun 1, 2018 at 6:21 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Jun 01, 2018 at 02:57:53PM +0200, Mike Galbraith wrote:
> > b956575bed91ecfb136a8300742ecbbf451471ab is the first bad commit
> > commit b956575bed91ecfb136a8300742ecbbf451471ab
> > Author: Andy Lutomirski <luto@xxxxxxxxxx>
> > Date: Mon Oct 9 09:50:49 2017 -0700
> >
> > x86/mm: Flush more aggressively in lazy TLB mode
>
> Oh boy... Maybe we should start looking at that optimization Andy
> mentioned.

Jolly.

>
> IIRC all page freeing does indeed go through tlb_remove_page(), it
> shouldn't be too hard to make that work.

Before we go too far down this rabbit hole, let's figure out what's
actually going on. Mike, does your system have PCID? If it does,
then my proposed optimization wouldn't do anything.

Can you try inverting the return value of
tlb_defer_switch_to_init_mm() and seeing if the result changes?