Re: [PATCH v3 0/6] Static calls

From: Andy Lutomirski
Date: Mon Jan 14 2019 - 15:11:45 EST


On Sun, Jan 13, 2019 at 6:41 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 1/13/19 6:31 PM, H. Peter Anvin wrote:
> >
> > static cpumask_t text_poke_cpumask;
> >
> > static void text_poke_sync(void)
> > {
> > smp_wmb();
> > text_poke_cpumask = cpu_online_mask;
> > smp_wmb(); /* Should be optional on x86 */
> > cpumask_clear_cpu(&text_poke_cpumask, smp_processor_id());
> > on_each_cpu_mask(&text_poke_cpumask, text_poke_sync_cpu, NULL, false);
> > while (!cpumask_empty(&text_poke_cpumask)) {
> > cpu_relax();
> > smp_rmb();
> > }
> > }
> >
> > static void text_poke_sync_cpu(void *dummy)
> > {
> > (void)dummy;
> >
> > smp_rmb();
> > cpumask_clear_cpu(&poke_bitmask, smp_processor_id());
> > /*
> > * We are guaranteed to return with an IRET, either from the
> > * IPI or the #BP handler; this provides serialization.
> > */
> > }
> >
>
> The invariants here are:
>
> 1. The patching routine must set each bit in the cpumask after each event
> that requires synchronization is complete.
> 2. The bit can be (atomically) cleared on the target CPU only, and only in a
> place that guarantees a synchronizing event (e.g. IRET) before it may
> reaching the poked instruction.
> 3. At a minimum the IPI handler and #BP handler needs to clear the bit. It
> *is* also possible to clear it in other places, e.g. the NMI handler, if
> necessary as long as condition 2 is satisfied.
>

I don't even think this is sufficient. I think we also need everyone
who clears the bit to check if all bits are clear and, if so, remove
the breakpoint. Otherwise we have a situation where, if you are in
text_poke_bp() and you take an NMI (or interrupt or MCE or whatever)
and that interrupt then hits the breakpoint, then you deadlock because
no one removes the breakpoint.

If we do this, and if we can guarantee that all CPUs make forward
progress, then maybe the problem is solved. Can we guarantee something
like all NMI handlers that might wait in a spinlock or for any other
reason will periodically check if a sync is needed while they're
spinning?

--Andy