Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores

From: Chris Metcalf
Date: Tue Mar 31 2015 - 14:39:36 EST


On 03/31/2015 06:17 AM, Christoph Lameter wrote:
On Mon, 30 Mar 2015, cmetcalf@xxxxxxxxxx wrote:

Running watchdog can be a helpful debugging feature on regular
cores, but it's incompatible with nohz_full, since it forces
regular scheduling events. Accordingly, just exit out immediately
from any nohz_full core.
At this point we still have a timer tick every second. So just change the
way the checking occurs that it can be done during the once per second
tick for now? If the tick idle period is expanded later maybe only run the
watchdog activity during those inevitable ticks?

Someone recently suggested disabling the forced once-per-second
tick :)

https://lkml.org/lkml/2014/10/31/364

I am hopeful that we can continue to drive toward that goal, and
reluctant to suggest that we pile anything else onto the existing
scheduler_tick_max_deferment() assumptions...

It may be best if the watchdog could be configured as to which processors
it should run on?

I mentioned this in my reply to Ingo. My naive code was simply to
force the cpuset of watchdog-enabled cores to be the complement
of the nohz_full cpuset. However, you could also imagine coding
up support for a generic cpuset (defaulting in the obvious ways)
that could still be overridden.

This may come back to a question of just why one believes that
nohz_full is a good thing in the first place. For folks that are doing
it just to improve performance, power, etc, generally, it may not
matter much whether the watchdog ticks occasionally. But for folks
who are doing it to establish cores that are run completely tick-free
for days on end so they can help process 100 Gb packet streams
and never drop a packet, the calculus is a little different. My bias
is to say that once you've tagged a core as nohz_full, you never want
to run the watchdog on it. But it's worth supporting multiple uses
of nohz_full, certainly.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/