Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection

From: Sergey Senozhatsky
Date: Thu Jul 15 2021 - 05:09:56 EST


On (21/05/22 00:56), Sergey Senozhatsky wrote:
> Soft watchdog timer function checks if a virtual machine
> was suspended and hence what looks like a lockup in fact
> is a false positive.
>
> This is what kvm_check_and_clear_guest_paused() does: it
> tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
> and if it's set then we need to touch all watchdogs and bail
> out.
>
> Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
> check works fine.
>
> There is, however, one more watchdog that runs from IRQ, so
> watchdog timer fn races with it, and that watchdog is not aware
> of PVCLOCK_GUEST_STOPPED - RCU stall detector.
>
> apic_timer_interrupt()
> smp_apic_timer_interrupt()
> hrtimer_interrupt()
> __hrtimer_run_queues()
> tick_sched_timer()
> tick_sched_handle()
> update_process_times()
> rcu_sched_clock_irq()
>
> This triggers RCU stalls on our devices during VM resume.
>
> If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
> before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
> then there is nothing on this VCPU that touches watchdogs and
> RCU reads stale gp stall timestamp and new jiffies value, which
> makes it think that RCU has stalled.
>
> Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
> don't report RCU stalls when we resume the VM.

Hello Paul,

I've noticed that this patch set didn't make it to Linus's tree.
Was it intentional?