Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

From: Peter Zijlstra
Date: Fri Jul 27 2018 - 03:51:14 EST


On Wed, Jul 25, 2018 at 11:18:46AM -0700, Eduardo Valentin wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data, and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.

> +static int tsc_pm_notifier(struct notifier_block *notifier,
> + unsigned long pm_event, void *unused)
> +{
> + switch (pm_event) {
> + case PM_HIBERNATION_PREPARE:
> + clear_sched_clock_stable();
> + break;
> + case PM_POST_HIBERNATION:
> + /* Set back to the default */
> + if (!check_tsc_unstable())
> + set_sched_clock_stable();
> + break;
> + }

I've not looked at this in detail yet, but this is an absolute no go,
not going to happen, full stop.

If we _ever_ mark the thing unstable, that's it, the end. Allowing it to
go back to stable is a source of utter fail.