Re: [patch 0/2] suspend/resume regression fixes

From: Linus Torvalds
Date: Sat Sep 22 2007 - 19:01:31 EST




On Sat, 22 Sep 2007, Thomas Gleixner wrote:
>
> My final enlightment was, when I removed the ACPI processor module,
> which controls the lower idle C-states, right before resume; this
> worked fine all the time even without all the workaround hacks.
>
> I really hope that this two patches finally set an end to the "jinxed
> VAIO heisenbug series", which started when we removed the periodic
> tick with the clockevents/dyntick patches.

Ok, so the patches look fine, but I somehow have this slight feeling that
you gave up a bit too soon on the "*why* does this happen?" question.

I realize that the answer is easily "because ACPI screwed up", but I'm
wondering if there's something we do to trigger that screw-up.

In particular, I also suspect that this may not really fix the problem -
maybe it just makes the window sufficiently small that it no longer
triggers. Because we don't necessarily understand what the real background
for the problem is, I'm not sure we can say that it is solved.

The reason I say this is that I have a suspicion on what triggers it.

I suspect that the problem is that we do

pm_ops->prepare();
disable_nonboot_cpus()
suspend_enter();
enable_nonboot_cpus()
pm_finish()

and here the big thing to notice is that "pm_ops->prepare()" call, which
sets the wakup vector etc etc.

So maybe the real problem here is that once we've done the "->prepare()"
call and ACPI has set up various stuff, we MUST NOT do any calls to any
ACPI routines to set low-power states, because the stupid firmware isn't
expecting it.

Now, if this is the cause, then I think your patch should indeed fix it,
since you get called by the early-suspend code (which happens *before* the
"->prepare()" call), but at the same time, I wonder if maybe it would be
slightly "more correct" to instead of using the suspend/resume callbacks,
simply do this in the "acpi_pm_prepare()" stage, since that is likely the
thing that triggers it?

But hey, I think I'll apply the patches as-is. I'd just feel even better
if we actually understood *why* doing the CPU Cx states is not something
we can do around the suspend code!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/