Re: [PATCH] sched: watchdog: Touch kernel watchdog in sched code

From: Paul Turner
Date: Thu Mar 05 2020 - 17:08:17 EST


On Thu, Mar 5, 2020 at 10:07 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
> > On Wed, Mar 04, 2020 at 01:39:41PM -0800, Xi Wang wrote:
> >> The main purpose of kernel watchdog is to test whether scheduler can
> >> still schedule tasks on a cpu. In order to reduce latency from
> >> periodically invoking watchdog reset in thread context, we can simply
> >> touch watchdog from pick_next_task in scheduler. Compared to actually
> >> resetting watchdog from cpu stop / migration threads, we lose coverage
> >> on: a migration thread actually get picked and we actually context
> >> switch to the migration thread. Both steps are heavily protected by
> >> kernel locks and unlikely to silently fail. Thus the change would
> >> provide the same level of protection with less overhead.
> >>
> >> The new way vs the old way to touch the watchdogs is configurable
> >> from:
> >>
> >> /proc/sys/kernel/watchdog_touch_in_thread_interval
> >>
> >> The value means:
> >> 0: Always touch watchdog from pick_next_task
> >> 1: Always touch watchdog from migration thread
> >> N (N>0): Touch watchdog from migration thread once in every N
> >> invocations, and touch watchdog from pick_next_task for
> >> other invocations.
> >>
> >
> > This is configurable madness. What are we really trying to do here?
>
> Create yet another knob which will be advertised in random web blogs to
> solve all problems of the world and some more. Like the one which got
> silently turned into a NOOP ~10 years ago :)
>

The knob can obviously be removed, it's vestigial and reflects caution
from when we were implementing / rolling things over to it. We have
default values that we know work at scale. I don't think this actually
needs or wants to be tunable beyond on or off (and even that could be
strictly compile or boot time only).