Re: [PATCH v10 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

From: Sebastian Ene
Date: Fri Jul 08 2022 - 04:19:10 EST


On Thu, Jul 07, 2022 at 07:27:38PM +0100, Will Deacon wrote:
> Hi Sebastian,
>
> On Thu, Jul 07, 2022 at 03:42:27PM +0000, Sebastian Ene wrote:

Hi Will,

> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> >
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver is
> > an emulated MMIO device which is part of the KVM virtual machine
> > monitor and this part accounts for lost time by looking at the
> > /proc/{}/task/{}/stat entries.
> >
> > Signed-off-by: Sebastian Ene <sebastianene@xxxxxxxxxx>
> > ---
> > drivers/misc/Kconfig | 14 ++
> > drivers/misc/Makefile | 1 +
> > drivers/misc/vcpu_stall_detector.c | 209 +++++++++++++++++++++++++++++
> > 3 files changed, 224 insertions(+)
> > create mode 100644 drivers/misc/vcpu_stall_detector.c
>
> Thanks for addressing all of my feedback on v9 so promptly:
>
> Reviewed-by: Will Deacon <will@xxxxxxxxxx>
>
> Just one question on this part:
>
> > +static enum hrtimer_restart
> > +vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
> > +{
> > + u32 ticks, ping_timeout_ms;
> > +
> > + /* Reload the stall detector counter register every
> > + * `ping_timeout_ms` to prevent the virtual device
> > + * from decrementing it to 0. The virtual device decrements this
> > + * register at 'clock_freq_hz' frequency.
> > + */
> > + ticks = vcpu_stall_config.clock_freq_hz *
> > + vcpu_stall_config.stall_timeout_sec;
>
> It would be quite easy for this to overflow 32 bits, so perhaps it would
> be best to check the values from the DT during probe and fallback to the
> defaults (with a warning) if the result of the multiplication is out
> of range for the 32-bit register.
>
> What do you think? My review stands in any case, as this shouldn't happen
> in practice with sensible values.
>

Good point ! I think falling back to defaults in case the values from the
DT exceed a limit is a good approach. I will do that in the next
version.

> Will

Thanks,
Seb