S4 resume broken since 2.6.39 (3.1, too)

From: Takashi Iwai
Date: Tue Sep 20 2011 - 12:12:10 EST


Hi,

during testing 3.0.4 kernels, I found that the S4 is broken in recent
kernels since 2.6.39. The symptom is that the machine suddenly
reboots after the S4 resume image is read. This happens only
occasionally, usually within 10 or 20 S4 cycles. The problem is still
found in 3.1-rc6.

After a bisection, the likely culprit is:
commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <yinghai@xxxxxxxxxx>
Date: Fri Dec 17 16:58:28 2010 -0800

x86-64, mm: Put early page table high

And the essential revert to fix the problem is like below.
It reverts the memory assignment in the old way, and the diff of dmesg
is something like:

@@ -49,10 +49,10 @@
Base memory trampoline at [ffff880000098000] 98000 size 20480
init_memory_mapping: 0000000000000000-000000007a000000
0000000000 - 007a000000 page 2M
-kernel direct mapping tables up to 7a000000 @ 7913f000-79142000
+kernel direct mapping tables up to 7a000000 @ 1fffd000-20000000
init_memory_mapping: 0000000100000000-0000000100600000
0100000000 - 0100600000 page 2M
-kernel direct mapping tables up to 100600000 @ 1005fa000-100600000
+kernel direct mapping tables up to 100600000 @ 7913c000-79142000
RAMDISK: 36d36000 - 37ff0000
ACPI: RSDP 00000000000f2f10 00024 (v02 HPQOEM)
ACPI: XSDT 0000000079ffe120 00094 (v01 HPQOEM SLIC-MPC 00000004 01000013)
@@ -76,7 +76,7 @@
No NUMA configuration found
Faking a node at 0000000000000000-0000000100600000
Initmem setup node 0 0000000000000000-0000000100600000
- NODE_DATA [00000001005d3000 - 00000001005f9fff]
+ NODE_DATA [00000001005d9000 - 00000001005fffff]
[ffffea0000000000-ffffea00039fffff] PMD -> [ffff880076a00000-ffff8800787fffff] on node 0
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000

And S4 seems working more stably now.

I still have no idea why the commit above introduced the buggy
behavior. Through a quick look at the output above, the assigned
areas look OK...

Can anyone give a deeper insight?


thanks,

Takashi

---
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3032644..87488b9 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -63,9 +63,8 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
#ifdef CONFIG_X86_32
/* for fixmap */
tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
-
- good_end = max_pfn_mapped << PAGE_SHIFT;
#endif
+ good_end = max_pfn_mapped << PAGE_SHIFT;

base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
if (base == MEMBLOCK_ERROR)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/