Re: [RFC][PATCH 2/2] time: alarmtimer: Use TASK_FREEZABLE to cleanup freezer handling

From: Thomas Gleixner
Date: Mon Feb 20 2023 - 16:19:06 EST


Michael!

On Mon, Feb 20 2023 at 19:11, Michael Nazzareno Trimarchi wrote:
> On Mon, Feb 20, 2023 at 6:48 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> [ 27.349352] alarmtimer_enqueue()
>>
>> U: Before SUSPEND
>>
>> [ 31.353228] PM: suspend entry (s2idle)
>> [ 31.388857] Filesystems sync: 0.033 seconds
>> [ 31.418427] Freezing user space processes
>> [ 31.422406] Freezing user space processes completed (elapsed 0.002 seconds)
>> [ 31.425435] OOM killer disabled.
>> [ 31.426833] Freezing remaining freezable tasks
>> [ 31.429838] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
>> [ 31.432922] printk: Suspending console(s) (use no_console_suspend to debug)
>> [ 31.435912] alarmtimer alarmtimer.0.auto: PM: dpm_run_callback(): platform_pm_suspend+0x0/0x50 returns -16
>> [ 31.435954] alarmtimer alarmtimer.0.auto: PM: failed to suspend: error -16
>>
>> That means the RTC interrupt was raised before the system was able to suspend.
>
> if (ktime_to_ns(min) < 2 * NSEC_PER_SEC) {
> pm_wakeup_event(dev, 2 * MSEC_PER_SEC);
> return -EBUSY;
> }
>
> I think that above happens to you. So it means that you are too close
> to this limit, can be?

Maybe.

> Yes but the alarm for me was set to be fired just before freezing. Is
> this a valid scenario?

Sure why not?

> Let's say that I set an alarm to be fired just before the userspace
> freeze happens. If I'm close to it then then process will not process
> the signal to enquene again the alarm in the list and then during
> alarm suspend the list will be empty for the above.

Halfways, but slowly your explanations start to make sense. Here is the
dmesg output you provided:

> [ 89.674127] PM: suspend entry (deep)
> [ 89.714916] Filesystems sync: 0.037 seconds
> [ 89.733594] Freezing user space processes
> [ 89.740680] Freezing user space processes completed (elapsed 0.002 seconds)

User space tasks are frozen now.

> [ 89.748593] OOM killer disabled.
> [ 89.752257] Freezing remaining freezable tasks
> [ 89.756807] alarmtimer_fired: called
> [ 89.756831] alarmtimer_dequeue: called <---- HERE

Here fires the underlying hrtimer before devices are suspended, so the
sig_sendqueue() cannot wake up the task because task->state ==
TASK_FROZEN, which means the signal wont be handled and the timer wont
be rearmed until the task is thawed.

And as you correctly observed the alarmtimer_suspend() path won't see a
pending timer anymore because it is dequeued.

So precisely the time between freeze(alarmtask) and alarmtimer_suspend()
is a gaping hole which guarantees lost wakeups.

That's completely unrelated to Johns patches. That hole has been there
forever.

I assume that this horrible freezer hackery was supposed to plug that
hole, but that gem is not solving anything as far as I understand what
it is doing. I'm still failing to understand what it is supposed to
solve, but that's a different story.

Aside of that I can clearly reproduce the issue, now that I understand
what you are trying to tell, on plain 6.2 without and with John's
cleanup.

Something like the below plugs the hole reliably.

Thanks,

tglx
---
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -26,6 +26,7 @@
#include <linux/freezer.h>
#include <linux/compat.h>
#include <linux/module.h>
+#include <linux/suspend.h>
#include <linux/time_namespace.h>

#include "posix-timers.h"
@@ -176,6 +177,7 @@ static void alarmtimer_dequeue(struct al
alarm->state &= ~ALARMTIMER_STATE_ENQUEUED;
}

+static atomic_t alarmtimer_wakeup;

/**
* alarmtimer_fired - Handles alarm hrtimer being fired.
@@ -194,6 +196,8 @@ static enum hrtimer_restart alarmtimer_f
int ret = HRTIMER_NORESTART;
int restart = ALARMTIMER_NORESTART;

+ atomic_inc(&alarmtimer_wakeup);
+
spin_lock_irqsave(&base->lock, flags);
alarmtimer_dequeue(base, alarm);
spin_unlock_irqrestore(&base->lock, flags);
@@ -244,6 +248,16 @@ static int alarmtimer_suspend(struct dev
if (!rtc)
return 0;

+ /*
+ * Handle wakeups which happened between the start of suspend and
+ * now as those wakeups might have tried to wake up a frozen task
+ * which means they are not longer in the alarm timer list.
+ */
+ if (atomic_read(&alarmtimer_wakeup)) {
+ pm_wakeup_event(dev, 0);
+ return -EBUSY;
+ }
+
/* Find the soonest timer to expire*/
for (i = 0; i < ALARM_NUMTYPE; i++) {
struct alarm_base *base = &alarm_bases[i];
@@ -296,6 +310,31 @@ static int alarmtimer_resume(struct devi
return 0;
}

+static int alarmtimer_pm_notifier_fn(struct notifier_block *bl, unsigned long state,
+ void *unused)
+{
+ switch (state) {
+ case PM_HIBERNATION_PREPARE:
+ case PM_POST_HIBERNATION:
+ atomic_set(&alarmtimer_wakeup, 0);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block alarmtimer_pm_notifier = {
+ .notifier_call = alarmtimer_pm_notifier_fn,
+};
+
+static inline int alarmtimer_register_pm_notifier(void)
+{
+ return register_pm_notifier(&alarmtimer_pm_notifier);
+}
+
+static inline void alarmtimer_unregister_pm_notifier(void)
+{
+ unregister_pm_notifier(&alarmtimer_pm_notifier);
+}
#else
static int alarmtimer_suspend(struct device *dev)
{
@@ -306,6 +345,15 @@ static int alarmtimer_resume(struct devi
{
return 0;
}
+
+static inline int alarmtimer_register_pm_notifier(void)
+{
+ return 0;
+}
+
+static inline void alarmtimer_unregister_pm_notifier(void)
+{
+}
#endif

static void
@@ -904,11 +952,17 @@ static int __init alarmtimer_init(void)
if (error)
return error;

- error = platform_driver_register(&alarmtimer_driver);
+ error = alarmtimer_register_pm_notifier();
if (error)
goto out_if;

+ error = platform_driver_register(&alarmtimer_driver);
+ if (error)
+ goto out_pm;
+
return 0;
+out_pm:
+ alarmtimer_unregister_pm_notifier();
out_if:
alarmtimer_rtc_interface_remove();
return error;