Re: kexec/kdump kernel fails to start

From: Dave Young
Date: Thu Oct 18 2012 - 02:37:24 EST


On 10/18/2012 10:16 AM, Dave Young wrote:

> On Sat, Sep 29, 2012 at 3:13 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>
>> * Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>>
>>> On Sun, Sep 23, 2012 at 1:27 PM, Dan Carpenter <dan.carpenter@xxxxxxxxxx> wrote:
>>>> On Wed, Sep 05, 2012 at 11:34:25PM +0800, Cong Wang wrote:
>>>>> On Wed, Sep 5, 2012 at 1:32 AM, Flavio Leitner <fbl@xxxxxxxxxx> wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> I have system that no longer boots kdump kernel. Basically,
>>>>>>
>>>>>> # echo c > /proc/sysrq-trigger
>>>>>>
>>>>>> to dump a vmcore doesn't work. It just hangs after showing the usual
>>>>>> panic messages. I've bisected the problem and the commit introducing
>>>>>> the issue is the one below.
>>>>>>
>>>>>> Any idea?
>>>>>>
>>>>>> commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
>>>>>> Author: WANG Cong <xiyou.wangcong@xxxxxxxxx> 2012-03-05 20:05:13
>>>>>> Committer: Ingo Molnar <mingo@xxxxxxx> 2012-03-06 05:38:26
>>>>>> Parent: 550cf00dbc8ee402bef71628cb71246493dd4500 (Merge tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
>>>>>> Child: a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f (x86, tlb: Switch cr3 in leave_mm() only when needed)
>>>>>> Branches: master, remotes/origin/master
>>>>>> Follows: v3.3-rc6
>>>>>> Precedes: v3.5-rc1
>>>>>>
>>>>>> x86/mm: Fix the size calculation of mapping tables
>>>>>
>>>>> There was some attempt to fix this:
>>>>> https://patchwork.kernel.org/patch/1195751/
>>>>>
>>>>> but for some reason it is not accepted.
>>>>
>>>> I filed a bug for this:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=47881
>>>>
>>>> Is it fixed now?
>>>
>>> that offending patch should be reverted...
>>>
>>> 722bc6b16771ed80871e1fd81c86d3627dda2ac8
>>
>> It does not revert cleanly - could someone send a (kexec
>> tested!) patch with a proper description?
>
> Hi, ingo
>
> Besides of commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8,
> below commit also need revert.
>
> commit bd2753b2dda7bb43c7468826de75f49c6a7e8965
> Author: Yinghai Lu <yinghai@xxxxxxxxxx>
> Date: Wed Jun 6 10:55:40 2012 -0700
>
> x86/mm: Only add extra pages count for the first memory range
> during pre-allocation early page table space
>
> Robin found this regression:
>
> | I just tried to boot an 8TB system. It fails very early in boot with:
> | Kernel panic - not syncing: Cannot find space for the kernel page tables
>
> git bisect commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8.
>
> A git revert of that commit does boot past that point on the 8TB
> configuration.
>
> That commit will add up extra pages for all memory range even
> above 4g.
>
> Try to limit that extra page count adding to first entry only.
>
> Bisected-by: Robin Holt <holt@xxxxxxx>
> Tested-by: Robin Holt <holt@xxxxxxx>
> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
> Cc: WANG Cong <xiyou.wangcong@xxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@xxxxxxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>
>
>
> OTOH, Jacob and Yinghai has better init_memory_mapping
> cleanup patches which are in tip:x86/mm2 already. Their patches fixes
> this issue
> as well.
>
> Since kdump does not work for long time since
> 722bc6b16771ed80871e1fd81c86d3627dda2ac8
> Can you or someone else help to get the init_memory_mapping patches merged?
>
> Or do you still prefer to revert 722bc6b and bd2753b2d? I think
> stable kernel also need a fix.
>




Just see Yinghai's coments, later init_memory_mapping cleanup
will also address the 4k pages in first 2/4M, so revert them should be better.
https://lkml.org/lkml/2012/9/4/533

Here is a patch for the reverting:
---
x86 mm: Revert find_early_table_space fix

722bc6b16771ed80871e1fd81c86d3627dda2ac8 Try to address the issue that the
first 2/4M should use 4k pages if PSE enabled. but extra counts should only
valid for x86_32. This commit cause kdump regression, kdump kernel hangs happens
with it.

As Yinghai Lu said they should be reverted. see below post:
https://lkml.org/lkml/2012/9/4/533

As there's a later fix to above fix which is bd2753b2dda7bb43c7468826de75f49c6a7e8965
So we need revert both of these two commits.

Tested kdump on physical and virutual machines.

Reverted commits:
commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
Author: WANG Cong <xiyou.wangcong@xxxxxxxxx>
Date: Mon Mar 5 15:05:13 2012 -0800

x86/mm: Fix the size calculation of mapping tables

For machines that enable PSE, the first 2/4M memory region still uses
4K pages, so needs more PTEs in this case, but
find_early_table_space() doesn't count this.

This patch fixes it.

The bug was found via code review, no misbehavior of the kernel
was observed.

Signed-off-by: WANG Cong <xiyou.wangcong@xxxxxxxxx>
Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: <ianfang.cn@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Link: http://lkml.kernel.org/n/tip-kq6a00qe33h7c7ais2xsywnh@xxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

commit bd2753b2dda7bb43c7468826de75f49c6a7e8965
Author: Yinghai Lu <yinghai@xxxxxxxxxx>
Date: Wed Jun 6 10:55:40 2012 -0700

x86/mm: Only add extra pages count for the first memory range during pre-allocatio

Robin found this regression:

| I just tried to boot an 8TB system. It fails very early in boot with:
| Kernel panic - not syncing: Cannot find space for the kernel page tables

git bisect commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8.

A git revert of that commit does boot past that point on the 8TB
configuration.

That commit will add up extra pages for all memory range even
above 4g.

Try to limit that extra page count adding to first entry only.

Bisected-by: Robin Holt <holt@xxxxxxx>
Tested-by: Robin Holt <holt@xxxxxxx>
Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
Cc: WANG Cong <xiyou.wangcong@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9B
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>


Signed-off-by: Dave Young <dyoung@xxxxxxxxxx>
---
arch/x86/mm/init.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -29,14 +29,8 @@ int direct_gbpages
#endif
;

-struct map_range {
- unsigned long start;
- unsigned long end;
- unsigned page_size_mask;
-};
-
-static void __init find_early_table_space(struct map_range *mr, unsigned long end,
- int use_pse, int use_gbpages)
+static void __init find_early_table_space(unsigned long end, int use_pse,
+ int use_gbpages)
{
unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
phys_addr_t base;
@@ -61,10 +55,6 @@ static void __init find_early_table_spac
#ifdef CONFIG_X86_32
extra += PMD_SIZE;
#endif
- /* The first 2/4M doesn't use large pages. */
- if (mr->start < PMD_SIZE)
- extra += mr->end - mr->start;
-
ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
} else
ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -95,6 +85,12 @@ void __init native_pagetable_reserve(u64
memblock_reserve(start, end - start);
}

+struct map_range {
+ unsigned long start;
+ unsigned long end;
+ unsigned page_size_mask;
+};
+
#ifdef CONFIG_X86_32
#define NR_RANGE_MR 3
#else /* CONFIG_X86_64 */
@@ -267,7 +263,7 @@ unsigned long __init_refok init_memory_m
* nodes are discovered.
*/
if (!after_bootmem)
- find_early_table_space(&mr[0], end, use_pse, use_gbpages);
+ find_early_table_space(end, use_pse, use_gbpages);

for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/