Re: [PATCH kernel v2 1/3] x86/amd: Cache values in percpu variables

From: Alexey Kardashevskiy
Date: Thu Jan 12 2023 - 00:21:54 EST




On 11/1/23 03:05, Borislav Petkov wrote:
On Fri, Dec 09, 2022 at 03:38:02PM +1100, Alexey Kardashevskiy wrote:

Make that Subject:

"x86/amd: Cache debug register values in percpu variables"

to make it less generic and more specific as to what you're doing.

Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to
be noticeable with the AMD KVM SEV-ES DebugSwap feature enabled.
KVM is going to store host's DR[0-3] and DR[0-3]_ADDR_MASK before
switching to a guest; the hardware is going to swap these on VMRUN
and VMEXIT.

Store MSR values passsed to set_dr_addr_mask() in percpu values

s/values/variables/

Unknown word [passsed] in commit message.

Use a spellchecker pls.

(when changed) and return them via new amd_get_dr_addr_mask().
The gain here is about 10x.

10x when reading percpu vars vs MSR reads?

Oh well.
>
As amd_set_dr_addr_mask() uses the array too, change the @dr type to
unsigned to avoid checking for <0.

I feel ya but that function will warn once, return 0 when the @dr number is
outta bounds and that 0 will still get used as an address mask.

"that function" is set_dr_addr_mask() (btw should I rename it to start with amd_? the commit log uses the wrong&longer name) which does not return a mask, amd_get_dr_addr_mask() does.

I think you really wanna return negative on error and the caller should not
continue in that case.

If it is out of bounds, it won't neither set or get. And these 2 helpers are used only by the kernel and the callers pass 0..3 and nothing else. BUG_ON() would do too, for example.


diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c75d75b9f11a..9ac5a19f89b9 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1158,24 +1158,41 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
return false;
}
-void set_dr_addr_mask(unsigned long mask, int dr)
+DEFINE_PER_CPU_READ_MOSTLY(unsigned long[4], amd_dr_addr_mask);

static

+
+static unsigned int amd_msr_dr_addr_masks[] = {
+ MSR_F16H_DR0_ADDR_MASK,
+ MSR_F16H_DR1_ADDR_MASK - 1 + 1,

- 1 + 1 ?

Why?

Because of the DR0 and then DR1 being in a different MSR range?

Yup.


Who cares, make it simple:

MSR_F16H_DR0_ADDR_MASK,
MSR_F16H_DR1_ADDR_MASK,
MSR_F16H_DR1_ADDR_MASK + 1,
MSR_F16H_DR1_ADDR_MASK + 2

and add a comment if you want to denote the non-contiguous range but meh.

imho having 1,2,3 in the code eliminates the need in a comment and produces the exact same end result. But since nobody cares, I'll do it the shorter way with just +1 and +2.


>> + MSR_F16H_DR1_ADDR_MASK - 1 + 2,
+ MSR_F16H_DR1_ADDR_MASK - 1 + 3
+};
+
+void set_dr_addr_mask(unsigned long mask, unsigned int dr)
{
- if (!boot_cpu_has(X86_FEATURE_BPEXT))
+ if (!cpu_feature_enabled(X86_FEATURE_BPEXT))
return;
- switch (dr) {
- case 0:
- wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0);
- break;
- case 1:
- case 2:
- case 3:
- wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0);
- break;
- default:
- break;
- }
+ if (WARN_ON_ONCE(dr >= ARRAY_SIZE(amd_msr_dr_addr_masks)))
+ return;
+
+ if (per_cpu(amd_dr_addr_mask, smp_processor_id())[dr] == mask)

Do that at function entry:

int cpu = smp_processor_id();

and use cpu here.

Sure. Out of curiosity - why?^w^w^w^w^ it reduced the vmlinux size by 48 bytes, nice.

Thanks for the review!



+ return;
+
+ wrmsr(amd_msr_dr_addr_masks[dr], mask, 0);
+ per_cpu(amd_dr_addr_mask, smp_processor_id())[dr] = mask;
+}

Thx.


--
Alexey