[PATCH 0/2] x86: Remove ideal_nops[]

From: Peter Zijlstra
Date: Fri Mar 12 2021 - 07:00:46 EST


Hi!

A while ago Steve complained about x86 being weird for having different NOPs [1]

Having cursed the same thing before, I figured it was time to look at the NOP
situation.

32bit simply isn't a performance target anymore, so all we need is a set of
NOPs that works on all.

x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
P6 and is architecturally mandated for x86_64. However, some uarchs made the
choice to limit NOPL decoding to a single port, which obviously limits NOPL
throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
prefixes, hobbling prefix NOP throughput.

But the thing is, all the modern uarchs can handle both without issue; that is
AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
exception is Atom, which has the prefix penalty.

Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
simply irrelevant today, remove variable NOPs and use NOPL.

This gives us deterministic NOPs and restores sanity.



[1] https://lkml.kernel.org/r/20210302105827.3403656c@xxxxxxxxxxxxxxxxxx