Re: 2.6.35.5: hibernation broken... AGAIN

From: Rafael J. Wysocki
Date: Fri Nov 26 2010 - 15:25:43 EST


On Thursday, November 18, 2010, Hugh Dickins wrote:
> On Wed, 17 Nov 2010, Ondrej Zary wrote:
> > On Wednesday 17 November 2010 22:12:01 Rafael J. Wysocki wrote:
> > > On Wednesday, November 17, 2010, Andrew Morton wrote:
> > > > On Wed, 17 Nov 2010 21:53:52 +0100
> > > > "Rafael J. Wysocki" <rjw@xxxxxxx> wrote:
> > > > > On Wednesday, November 17, 2010, Ondrej Zary wrote:
> > > > > > Hello,
> > > > > > the nasty memory-corrupting hibernation bug
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=15753 is back since
> > > > > > 2.6.35.5. 2.6.35.4 works fine, 2.6.35.5 crashes after two days.
>
> That's distressing, for both and all of us: I'm sorry.
>
> > > > > >
> > > > > > It seems to be caused by b77c254d8d66e5e9aa81239fedba9f3d568097d9.
> > >
> > > > commit b77c254d8d66e5e9aa81239fedba9f3d568097d9
> > > > Author: Hugh Dickins <hughd@xxxxxxxxxx>
> > > > Date: Thu Sep 9 16:38:09 2010 -0700
> > > >
> > > > swap: prevent reuse during hibernation
>
> Embarrassing: I suspect that I've been confused, not for the first
> time, by the fork-like nature of hibernation and its images.
> I wonder if this patch below fixes it, Ondrej?
>
> (And is it kernel swsusp or user swsusp that you're using? May not
> matter at all, but will help us to think more clearly about it,
> if the corruption remains after this patch.)
>
> Rafael, do you agree that this patch was actually required even for
> your original commit 452aa6999e6703ffbddd7f6ea124d3968915f3e3
> mm/pm: force GFP_NOIO during suspend/hibernation and resume?

(Sorry for the late response).

Well, not exactly.

First, IMO set_gfp_allowed_mask(saved_mask) should not be called before
dpm_resume_end(), because IO allocations may try to use suspended devices in
theory (although that's not likely due to the swsusp_free() before). So the
right thing to do appears to be:

if (error || !in_suspend) {
swsusp_free();

dpm_resume_end(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);

if (error || !in_suspend)
set_gfp_allowed_mask(saved_mask);

resume_console();

Second, since we don't call set_gfp_allowed_mask(saved_mask) in the
(in_suspend && !error) case, hibernation_platform_enter() also should be
updated (I think we don't need to use set_gfp_allowed_mask() in it at all).

There's one more subtlety. Namely, the saving of an image by s2disk
may be aborted and in that case we need to restore the original
gfp_allowed_mask too, but I need to look deeper into the code to see how to do
it cleanly.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/