Re: early fixmap causes kmap breakage

From: Eric W. Biederman
Date: Tue Dec 30 2008 - 17:47:55 EST


Nick Piggin <npiggin@xxxxxxx> writes:

> On Tue, Dec 30, 2008 at 07:13:44AM +0100, Ingo Molnar wrote:
>>
>> * Nick Piggin <npiggin@xxxxxxx> wrote:
>>
>> > On Mon, Dec 29, 2008 at 03:17:31PM -0800, Andrew Morton wrote:
>> > > On Thu, 18 Dec 2008 22:15:43 +0100
>> > > Nick Piggin <npiggin@xxxxxxx> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I've debugged a problem where i386+pae systems with more than a few CPUs
>> > > > blow up at boot in the kmap_atomic code.
>> > >
>> > > ping?
>> >
>> > No further progress here, I'm waiting on input for how to fix this
>> > "nicely". Meantime, clearing the early fixmap pte I guess works, but you
>> > lose a page... is it possible to put it into .initdata or is there some
>> > issue with that? (I guess on a PAE kernel, 4K isn't a big deal).
>>
>> yeah, 4K shouldnt be a big deal. Mind sending a patch for this?
>
> How's this?
> --
>
> The early fixmap pmd entry inserted at the very top of the KVA is casing the
> subsequent fixmap mapping code to not provide physically linear pte pages over
> the kmap atomic portion of the fixmap (which relies on said property to
> calculate
> pte address).
>
> This has caused weird boot failures in kmap_atomic much later in the boot
> process (initial userspace faults) on a 32-bit PAE system with a larger number
> of CPUs (smaller CPU counts tend not to run over into the next page so don't
> show up the problem).
atomic>
> Solve this by attempting to clear out the page table, and copy any of its
> entries to the new one. Also, add a bug if a nonlinear condition is encounted
> and can't be resolved, which might save some hours of debugging if this fragile
> scheme ever breaks again...
>
> Putting swapper_pg_fixmap into initdata is an exercise left for the reviewer...

Ok. I see what is going on now. We have exceeded 512 fixmap entries, causing
the fixmap entries to consume more than 2MB of the address space. Which broke
the assumption that the fixmap entries are all contiguous.

Ditching the swapper_pg_fixmap has some problems.

This appears to break early_printk to a usb debug port, which calls
set_fixmap_nocache and expects the mapping to last.

This looks like it will have problems with Xen and other environments
where we come in with a pre-populated page table, possibly unmapping
something important.

one_page_table_init relies on alloc_bootmem_low_pages for it's memory allocation
so we do not have a guarantee that we will have contiguous memory even without
this.

I see three ways we can address this.
- Grow swapper_pg_fixmap to cover the entire fixmap range.
This trivially and without problems gives an atomic guarantee,
and should allow removal of code that sets up the fixmaps later
in C, except in weird cases like Xen.

- Decide it is worth optimizing kmap_atomic_prot some more.
Have a kmap_pte per cpu.
Cache line align the kmap pte entries so we don't get conflicts
per cpu, at which point we should be guaranteed the all 13 of
them will be physically contiguous.

- Not support more than 32 cpus on x86_32.

I suspect it might even be worth writing a version of one_page_table_init
that would guarantee discontiguous pages. So we can flush out these
kinds of fragile assumptions.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/