Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock

From: Fenghua Yu
Date: Wed Jun 26 2019 - 16:46:25 EST


On Wed, Jun 26, 2019 at 10:20:05PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Jun 2019, Fenghua Yu wrote:
> > +
> > +static atomic_t split_lock_debug;
> > +
> > +void split_lock_disable(void)
> > +{
> > + /* Disable split lock detection on this CPU */
> > + this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > + wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> > +
> > + /*
> > + * Use the atomic variable split_lock_debug to ensure only the
> > + * first CPU hitting split lock issue prints one single complete
> > + * warning. This also solves the race if the split-lock #AC fault
> > + * is re-triggered by NMI of perf context interrupting one
> > + * split-lock warning execution while the original WARN_ONCE() is
> > + * executing.
> > + */
> > + if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
> > + WARN_ONCE(1, "split lock operation detected\n");
> > + atomic_set(&split_lock_debug, 0);
>
> What's the purpose of this atomic_set()?

atomic_set() releases the split_lock_debug flag after WARN_ONCE() is done.
The same split_lock_debug flag will be used in sysfs write for atomic
operation as well, as proposed by Ingo in https://lkml.org/lkml/2019/4/25/48
So that's why the flag needs to be cleared, right?

>
> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > + unsigned int trapnr = X86_TRAP_AC;
> > + char str[] = "alignment check";
> > + int signr = SIGBUS;
> > +
> > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> > + return;
> > +
> > + cond_local_irq_enable(regs);
> > + if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > + /*
> > + * Only split locks can generate #AC from kernel mode.
> > + *
> > + * The split-lock detection feature is a one-shot
> > + * debugging facility, so we disable it immediately and
> > + * print a warning.
> > + *
> > + * This also solves the instruction restart problem: we
> > + * return the faulting instruction right after this it
>
> we return the faulting instruction ... to the store so we get our deposit
> back :)
>
> the fault handler returns to the faulting instruction which will be then
> executed without ....
>
> Don't try to impersonate code, cpus or whatever. It doesn't make sense and
> confuses people.
>
> > + * will be executed without generating another #AC fault
> > + * and getting into an infinite loop, instead it will
> > + * continue without side effects to the interrupted
> > + * execution context.
>
> That last part 'instead .....' is redundant. It's entirely clear from the
> above that the faulting instruction is reexecuted ....
>
> Please write concise comments and do try to repeat the same information
> with a different painting.

I copied the comment completely from Ingo's comment on v8:
https://lkml.org/lkml/2019/4/25/40

Thanks.

-Fenghua