Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

From: Steven Rostedt
Date: Thu Nov 29 2018 - 12:13:15 EST


On Thu, 29 Nov 2018 09:02:23 -0800
Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:


> > Instead, I'd suggest:
> >
> > - just restart the instruction (with the suggested "ptregs->rip --")
> >
> > - to avoid any "oh, we're not making progress" issues, just fix the
> > instruction yourself to be the right call, by looking it up in the
> > "what needs to be fixed" tables.
> >
> > No?
>
> I thought that too. I think it deadlocks. CPU A does
> text_poke_bp(). CPU B is waiting for a spinlock with IRQs off. CPU
> C holds the spinlock and hits the int3. The int3 never goes away
> because CPU A is waiting for CPU B to handle the sync_core IPI.

I agree that this can happen.

>
> Or do you think we can avoid the IPI while the int3 is there?

No, we really do need to sync after we change the second part of the
command with the int3 on it. Unless there's another way to guarantee
that the full instruction gets seen when we replace the int3 with the
finished command.

To refresh everyone's memory for why we have an IPI (as IPIs have an
implicit memory barrier for the CPU).

We start with:

e8 01 02 03 04

and we want to convert it to: e8 ab cd ef 01

And let's say the instruction crosses a cache line that breaks it into
e8 01 and 02 03 04.

We add the breakpoint:

cc 01 02 03 04

We do a sync (so now everyone should see the break point), because we
don't want to update the second part and another CPU happens to update
the second part of the cache, and might see:

e8 01 cd ef 01

Which would not be good.

And we need another sync after we change the code so all CPUs see

cc ab cd ef 01

Because when we remove the break point, we don't want other CPUs to see

e8 ab 02 03 04

Which would also be bad.

-- Steve