Re: [PATCH] x86/TSC: Use RDTSCP

From: hpa
Date: Fri Nov 23 2018 - 15:23:25 EST


On November 23, 2018 12:03:07 PM PST, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>Hi,
>
>On Mon, Nov 19, 2018 at 07:45:56PM +0100, Borislav Petkov wrote:
>> From: Borislav Petkov <bp@xxxxxxx>
>>
>> Currently, the kernel uses
>>
>> [LM]FENCE; RDTSC
>>
>> in the timekeeping code, to guarantee monotonicity of time where the
>> *FENCE is selected based on vendor.
>>
>> Replace that sequence with RDTSCP which is faster or on-par and gives
>> the same guarantees.
>>
>> A microbenchmark on Intel shows that the change is on-par.
>>
>> On AMD, the change is either on-par with the current LFENCE-prefixed
>> RDTSC and some are slightly better with RDTSCP.
>>
>> The comparison is done with the LFENCE-prefixed RDTSC (and not with
>the
>> MFENCE-prefixed one, as one would normally expect) because all modern
>> AMD families make LFENCE serializing and thus avoid the heavy MFENCE
>by
>> effectively enabling X86_FEATURE_LFENCE_RDTSC.
>>
>> Co-developed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> Cc: John Stultz <john.stultz@xxxxxxxxxx>
>> Cc: Thomas Lendacky <Thomas.Lendacky@xxxxxxx>
>> Cc: x86@xxxxxxxxxx
>> ---
>> arch/x86/include/asm/msr.h | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index 91e4cf189914..f00f2b61d326 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -217,6 +217,8 @@ static __always_inline unsigned long long
>rdtsc(void)
>> */
>> static __always_inline unsigned long long rdtsc_ordered(void)
>> {
>> + DECLARE_ARGS(val, low, high);
>> +
>> /*
>> * The RDTSC instruction is not ordered relative to memory
>> * access. The Intel SDM and the AMD APM are both vague on this
>> @@ -227,9 +229,18 @@ static __always_inline unsigned long long
>rdtsc_ordered(void)
>> * ordering guarantees as reading from a global memory location
>> * that some other imaginary CPU is updating continuously with a
>> * time stamp.
>> + *
>> + * Thus, use the preferred barrier on the respective CPU, aiming
>for
>> + * RDTSCP as the default.
>> */
>> - barrier_nospec();
>> - return rdtsc();
>> + asm volatile(ALTERNATIVE_2("mfence; rdtsc",
>> + "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
>> + "rdtscp", X86_FEATURE_RDTSCP)
>> + : EAX_EDX_RET(val, low, high)
>> + /* RDTSCP clobbers ECX with MSR_TSC_AUX. */
>> + :: "ecx");
>> +
>> + return EAX_EDX_VAL(val, low, high);
>> }
>>
>
>This patch results in a crash with certain qemu emulations.
>
>[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[ 0.762233] invalid opcode: 0000 [#1] PTI
>[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123 #2
>[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[ 0.762832] EIP: read_tsc+0x4/0x10
>
>To reproduce:
>
>make ARCH=i386 defconfig
>echo "CONFIG_HIGHMEM64G=y" >> .config
>make ARCH=i386 olddefconfig
>make -j30 ARCH=i386
>
>qemu-system-i386 \
> -kernel arch/x86/boot/bzImage \
> -M q35 -cpu pentium3 -no-reboot -m 256 \
> -initrd rootfs.cpio \
> --append 'earlycon=uart8250,io,0x3f8,9600n8 rdinit=/sbin/init panic=-1
>console=ttyS0 console=tty' \
> -nographic -monitor none
>
>initrd or root file system doesn't really matter since the code never
>gets there, but it is available from here:
> https://github.com/groeck/linux-build-test/blob/master/rootfs/x86/rootfs.cpio.gz
>The qemu version does not matter either (I tried with 2.5 and 3.0).
>
>Reverting the patch fixes the problem.
>
>Guenter
>
>---
>crash log:
>
>...
>[ 0.756366] HPET: 3 timers in total, 0 timers will be used for
>per-cpu timer
>[ 0.756718] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
>[ 0.756869] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
>[ 0.762233] invalid opcode: 0000 [#1] PTI
>[ 0.762435] CPU: 0 PID: 1 Comm: swapper Not tainted
>4.20.0-rc3-next-20181123
>#2
>[ 0.762644] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>Ubuntu-1.8.2-1ubuntu1 04/01/2014
>[ 0.762832] EIP: read_tsc+0x4/0x10
>[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: cb495ed0
>[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>[ 0.762832] Call Trace:
>[ 0.762832] tk_setup_internals.constprop.10+0x3d/0x260
>[ 0.762832] timekeeping_notify+0x56/0xc0
>[ 0.762832] __clocksource_select+0xf3/0x150
>[ 0.762832] ? boot_override_clock+0x42/0x42
>[ 0.762832] clocksource_done_booting+0x2d/0x3b
>[ 0.762832] do_one_initcall+0x68/0x15e
>[ 0.762832] ? set_debug_rodata+0xf/0xf
>[ 0.762832] kernel_init_freeable+0xec/0x18b
>[ 0.762832] ? rest_init+0x80/0x80
>[ 0.762832] kernel_init+0x8/0xf0
>[ 0.762832] ret_from_fork+0x2e/0x38
>[ 0.762832] Modules linked in:
>[ 0.762832] ---[ end trace ba4ba4587aec46c9 ]---
>[ 0.762832] EIP: read_tsc+0x4/0x10
>[ 0.762832] Code: 00 01 00 eb 89 90 55 89 e5 5d c3 90 90 90 90 90 90
>90 90 90 90 90 55 a1 44 5a 8b c5 89 e5 5d c3 8d b6 00 00 00 00 55 89 e5
>57 <0f> ae f0b
>[ 0.762832] EAX: c58062c0 EBX: c58062c0 ECX: 00000008 EDX: c4c1f0a0
>[ 0.762832] ESI: c58168c0 EDI: 00200086 EBP: cb495ed4 ESP: c597f47c
>[ 0.762832] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS:
>00200002
>[ 0.762832] CR0: 80050033 CR2: 00000000 CR3: 0597a000 CR4: 000006f0
>
>---
>bisect log:
>
># bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next
>specific files for 20181123
># good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
>git bisect start 'HEAD' 'v4.20-rc3'
># good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge
>remote-tracking branch 'spi-nor/spi-nor/next'
>git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
># good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge
>remote-tracking branch 'kgdb/kgdb-next'
>git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
># bad: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking
>branch 'soundwire/next'
>git bisect bad d29686ab179c34c5dbaac067a9effbeeb6a8073e
># bad: [7cd63670817c236ccaf21ffe9d7b4921afeab130] Merge remote-tracking
>branch 'tip/auto-latest'
>git bisect bad 7cd63670817c236ccaf21ffe9d7b4921afeab130
># good: [f9c8ebdcb052214954136426990e13a88e45e906] Merge
>remote-tracking branch 'audit/next'
>git bisect good f9c8ebdcb052214954136426990e13a88e45e906
># good: [4142a8a10e7673fa71ff4cb5b4693c4dda299f7c] Merge
>remote-tracking branch 'spi/for-next'
>git bisect good 4142a8a10e7673fa71ff4cb5b4693c4dda299f7c
># good: [fc922a4c7c72131f6c65e4af4e36ff25f3af1485] Merge branch
>'x86/cpu'
>git bisect good fc922a4c7c72131f6c65e4af4e36ff25f3af1485
># good: [906d11c1ff007ad18e2f3477cca04c437ce19c64] Merge branch
>'x86/microcode'
>git bisect good 906d11c1ff007ad18e2f3477cca04c437ce19c64
># good: [d836e1d42739a86738dd92a1b7ab6729b2485cba] Merge branch
>'x86/platform'
>git bisect good d836e1d42739a86738dd92a1b7ab6729b2485cba
># bad: [191bba684b7459842e72b52718ec61f8bdbd8499] Merge branch
>'x86/timers'
>git bisect bad 191bba684b7459842e72b52718ec61f8bdbd8499
># good: [f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee] Merge branch
>'x86/pti'
>git bisect good f9cc0921aa4edd9bd4dd6943d5e38f69a301e8ee
># bad: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC: Use RDTSCP
>git bisect bad 2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07
># good: [a786ef152cdcfebc923a67f63c7815806eefcf81] x86/tsc: Make
>calibration refinement more robust
>git bisect good a786ef152cdcfebc923a67f63c7815806eefcf81
># first bad commit: [2e94061096c5c3aa6c3fe3ec2bec176c1f9c1b07] x86/TSC:
>Use RDTSCP

Right, because that cpu predates RDTSCP, so it needs to use a fallback.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.