Re: Memory corruption during hibernation since 2.6.31

From: Hugh Dickins
Date: Thu Jul 29 2010 - 14:45:22 EST

On Thu, 29 Jul 2010, KAMEZAWA Hiroyuki wrote:
> On Thu, 29 Jul 2010 14:23:33 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> > Can you please add explicit commenting in the code?
> >
> How about this ?
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> At hibernation, all pages-should-be-saved are written into a image (here, swap).
> Then, swap_map[], memmap etcs are also saved into disks.
> But, swap allocation happens one by one. So, the final image of swap_map[] is
> different from saved one and the commit c9e444103b5e7a5a3519f9913f59767f92e33baf
> changes page's state while assiging swap. Because memory can be modified in
> hibernation is only not-to-be-save memory. it's a breakage.
> This patch fixes it by disabling swap entry reuse at hibernation.
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> ---
> mm/swapfile.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> Index:
> ===================================================================
> ---
> +++
> @@ -315,8 +315,15 @@ checks:
> if (offset > si->highest_bit)
> scan_base = offset = si->lowest_bit;
> - /* reuse swap entry of cache-only swap if not busy. */
> - if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
> + /*
> + * reuse swap entry of cache-only swap if not busy &&
> + * when we're called via pageout(). At hibernation, swap-reuse
> + * is harmful because it changes memory status...which may
> + * be saved already.
> + */
> + if (vm_swap_full()
> + && usage == SWAP_HAS_CACHE
> + && si->swap_map[offset] == SWAP_HAS_CACHE) {
> int swap_was_freed;
> spin_unlock(&swap_lock);
> swap_was_freed = __try_to_reclaim_swap(si, offset);
> --

KAMEZAWA-San, that is a brilliant realization, I salute you.

So, in between snapshotting the image and actually hibernating, we have
two parallel universes, one frozen in the image, the other writing that
out to swap: with the danger that the latter (which is about to die)
will introduce fatal inconsistencies in the former by placing pages in
swap locations innocently reallocated from it. (Excuse me while I go
write the movie script.)

I'd never got to think about that before.

Your fix is neat though hacky (it's somewhat of a coincidence that the
usage arg happens to distinguish the hibernation case), and should be
enough to fix "your" regression, but is it really enough?

I'm worrying about the try_to_free_swap() calls in shrink_page_list():
can't those get called from direct reclaim (even if GFP_NOIO), and can't
direct reclaim get invoked from somewhere in the I/O path below
swap_writepage(), used for writing out the hibernation image?

Direct reclaim because kswapd does set_freezable(), so should itself
be out of the picture. But we cannot freeze writing the hibernation
image, and its occasional need for memory, so maybe a different approach
is required.

I've CC'ed Andrea because we were having an offline conversation about
whether ksmd (and his khugepaged) need to set_freezable(); and I wonder
if this swap bug underlies his interest, though he was mainly worrying
about I/O in progress.

Despite reading Documentation/power/freezing-of-tasks.txt, I have no
clear idea of what really needs freezing, and whether freezing can
fully handle the issues. Rafael, please can you advise?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at