Re: [Fastboot] Re: kexec "problem" [and patch updates]

From: Eric W. Biederman
Date: Sat Feb 28 2004 - 06:39:27 EST


"Randy.Dunlap" <rddunlap@xxxxxxxx> writes:

> On 27 Feb 2004 01:00:04 -0700 Eric W. Biederman wrote:
>
> | > It works fine on 2.6.2. It works for me on 2.6.3 if not SMP.
> | > If the kernel is built for SMP, when running kexec, I get a
> | > BUG in arch/i386/kernel/smp.c at line 359.
> | > I'm testing various workarounds for that BUG now.
> |
> | I will eyeball it...
> |
> | Is it the kernel that is shutting down, or the kernel that is being
> | brought up that has problems?
>
> the kernel that is shutting down.
>
> | The back trace from the BUG would be interesting.
>
> see below. my bad. i should have included it.
>
> | As I see it flush_tlb_others is being called when we have shutdown
> | cpus and the kernel still thinks we have the mm present on foreign
> | cpus.
>
> Martin Bligh thinks that there is a tlb race here.
> I printed the 2 cpu masks on my dual-proc macine and saw
> 0 in one of them and 0xc in the other one.

Ouch we have both cpus running when this happens, and we have not
started any shutdown whatsoever. This is the bit that sets up
the page tables for later use...

I think identity_map_pages will have problems with a kernel that does
the 4G/4G split, and it has known issues on some other architectures,
because they treat init_mm specially. So the proper solution may be
to simply rewrite identity_map_pages.

Before we do that in the short term we need to see if
identity_map_pages is actually doing anything bad. You are
not using the 4G/4G split so that is not the cause. So either
init_mm is now special in some way, or we have hit a generic kernel
bug.

So this may indeed be a tlb race. But it is init_mm->cpu_vm_mask and
cpu_online map that are different. With the implication being
that init_mm->cpu_vm_mask has cpus set that are not in cpu_online_map?
Very weird especially on SMP.

Without attribution I have a hard time making sense of which cpumask
is which so I can't draw any conclusions. But I find it very
interesting that it is bits 2 and 3 that are set. I wonder if
there is any mixup between logical cpu identities and apic ids.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/