Re: [PATCH 2/5] x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments

From: Jaswinder Singh Rajput
Date: Sat May 15 2010 - 15:30:00 EST


Hello Frank and Borislav,

On Thu, Apr 22, 2010 at 7:36 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> From: Frank Arnold <frank.arnold@xxxxxxx>
>
> When running a quest kernel on xen we get:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
> PGD 0
> Oops: 0000 [#1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
>
> Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
> RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
> 2ca/0x3df
> RSP: 0018:ffff880002203e08  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
> RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
> RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
> R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
> R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
> FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
> Stack:
>  ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
> <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
> <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
> Call Trace:
>  <IRQ>
>  [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
>  [<ffffffff81059140>] ? mod_timer+0x23/0x25
>  [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
>  [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
>  [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
>  [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
>  [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
>  [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
>  [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
>  [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
>  [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
>  <EOI>
>  [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
>  [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
>  [<ffffffff810114eb>] ? default_idle+0x36/0x53
>  [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
>  [<ffffffff81423a9a>] rest_init+0x7e/0x80
>  [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
>  [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
>  [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
> Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
>  00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
>  ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
> RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
>  RSP <ffff880002203e08>
> CR2: 0000000000000038
> ---[ end trace a7919e7f17c0a726 ]---
>
> The L3 cache index disable feature of AMD CPUs has to be disabled if the
> kernel is running as guest on top of a hypervisor because northbridge
> devices are not available to the guest. Currently, this fixes a boot
> crash on top of Xen. In the future this will become an issue on KVM as
> well.
>
> Check if northbridge devices are present and do not enable the feature
> if there are none.
>
> Signed-off-by: Frank Arnold <frank.arnold@xxxxxxx>
> Acked-by: Borislav Petkov <borislav.petkov@xxxxxxx>
> ---
>  arch/x86/kernel/cpu/intel_cacheinfo.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
> index acfb083..5ab14c8 100644
> --- a/arch/x86/kernel/cpu/intel_cacheinfo.c
> +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
> @@ -344,6 +344,10 @@ amd_check_l3_disable(int index, struct _cpuid4_info_regs *this_leaf)
>             boot_cpu_data.x86_mask < 0x1)
>                        return;
>
> +       /* not in virtualized environments */
> +       if (num_k8_northbridges == 0)
> +               return;
> +
>        this_leaf->can_disable = true;
>        this_leaf->l3_indices  = amd_calc_l3_indices();
>  }

I think :

1. checking num_k8_northbridges is not sufficient as
node_to_k8_nb_misc() of amd_calc_l3_indices() is almost doing the same
thing.

2. num_k8_northbridges is CONFIG_K8_NB specific

Can you try this :

If you return from amd_calc_l3_indices() if dev == NULL.

Thanks,
--
Jaswinder Singh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/