RE: [PATCH] x86: Prefer MWAIT over HALT on AMD processors

From: Carroll, Lewis
Date: Tue Apr 05 2022 - 22:47:49 EST


[Public]

> -----Original Message-----
> From: Borislav Petkov <bp@xxxxxxxxx>
> Sent: Tuesday, April 5, 2022 4:39 PM
> To: Carroll, Lewis <Lewis.Carroll@xxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; peterz@xxxxxxxxxxxxx;
> dave.hansen@xxxxxxxxxxxxxxx; Karny, Wyes <Wyes.Karny@xxxxxxx>; Limonciello,
> Mario <Mario.Limonciello@xxxxxxx>; Shenoy, Gautham Ranjal
> <gautham.shenoy@xxxxxxx>; Narayan, Ananth <Ananth.Narayan@xxxxxxx>; Rao,
> Bharata Bhasker <bharata@xxxxxxx>; len.brown@xxxxxxxxx; x86@xxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; hpa@xxxxxxxxx;
> chang.seok.bae@xxxxxxxxx; keescook@xxxxxxxxxxxx; metze@xxxxxxxxx;
> zhengqi.arch@xxxxxxxxxxxxx; mark.rutland@xxxxxxx
> Subject: Re: [PATCH] x86: Prefer MWAIT over HALT on AMD processors
>
> [CAUTION: External Email]
>
> On Tue, Apr 05, 2022 at 08:26:53PM +0000, Carroll, Lewis wrote:
> > This happens when:
> > * User disables global C-states in BIOS
> > * User disables cpuidle (e.g. idle=off or processor.max_cstate=0)
> > * Kernel does not have CONFIG_ACPI_PROCESSOR_IDLE
>
> All three or any one of those?

Just when I thought I was being thorough. Any of the above will block the
cpuidle driver from loading. As will absence of _CST ACPI methods (add that
as a fourth cause). Brings back memories of code comments about a certain
idle driver being created to work around broken BIOSes...

>
> > Genesis of this patch is customer performance observations.
>
> Please add that explanation to the changelog - it is important when looking
> back, trying to figure out why this was done.

We will have to see what we can sanitize. The original performance observation
(packet loss in a networking application) led to discovery of lots of cycles
in the various go-to-sleep-via-halt and wake-from-halt-via-IPI functions. Wyes
collected the raw data on the relative idle+wake-up latency and included that
in the commit msg. Think of that delta as the root cause of the performance
regression in this case.

>
> > Yes. We felt the code more readable with the prefer_mwait_c1_over_halt fn.
> > Hygon CPU init indeed sets X86_FEATURE ZEN.
> > AMD CPU init sets X86_FEATURE_ZEN for family >= 17h (not only 17h).
>
> Yes, but this new logic you're adding, basically says, use MWAIT on all Zen
> uarch CPUs, right?

Yes we are saying use MWAIT instead of HLT on all known (as of today) Zen
uarch CPUs (AMD >= 17h and Hygon).

>
> So why not write exactly that?
>
> The simpler the logic and the clearer the code, the better.
>
> > Cleanest solution might be a new CPU feature (e.g.
> > X86_PREFER_MWAIT_IDLE) that gets set appropriately, but that would
> > require touching more files.
>
> Yes, I thought about it too.
>
> Not really necessary if what I wrote above fits.
>
> And while you're touching files, pls add that change too:
>
> ---
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 73e643ae94b6..c1091f78f104 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -219,7 +219,7 @@
> #define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch
> Restricted Speculation */
> #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch
> Prediction Barrier */
> #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect
> Branch Predictors */
> -#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD
> family 0x17 or above (Zen) */
> +#define X86_FEATURE_ZEN ( 7*32+28) /* "" Set on CPUs
> of the Zen uarch */
> #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF
> workaround PTE inversion */
> #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
> #define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL
> configured */
>
> so that dhansen and peterz are not confused anymore. :-)

Ack.

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.