Re: early fixmap causes kmap breakage

From: Nick Piggin
Date: Tue Dec 30 2008 - 20:55:08 EST


On Tue, Dec 30, 2008 at 02:41:53PM -0800, Eric W. Biederman wrote:
> Nick Piggin <npiggin@xxxxxxx> writes:
>
> > On Tue, Dec 30, 2008 at 07:13:44AM +0100, Ingo Molnar wrote:
> >>
> >> * Nick Piggin <npiggin@xxxxxxx> wrote:
> >>
> >> > On Mon, Dec 29, 2008 at 03:17:31PM -0800, Andrew Morton wrote:
> >> > > On Thu, 18 Dec 2008 22:15:43 +0100
> >> > > Nick Piggin <npiggin@xxxxxxx> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I've debugged a problem where i386+pae systems with more than a few CPUs
> >> > > > blow up at boot in the kmap_atomic code.
> >> > >
> >> > > ping?
> >> >
> >> > No further progress here, I'm waiting on input for how to fix this
> >> > "nicely". Meantime, clearing the early fixmap pte I guess works, but you
> >> > lose a page... is it possible to put it into .initdata or is there some
> >> > issue with that? (I guess on a PAE kernel, 4K isn't a big deal).
> >>
> >> yeah, 4K shouldnt be a big deal. Mind sending a patch for this?
> >
> > How's this?
> > --
> >
> > The early fixmap pmd entry inserted at the very top of the KVA is casing the
> > subsequent fixmap mapping code to not provide physically linear pte pages over
> > the kmap atomic portion of the fixmap (which relies on said property to
> > calculate
> > pte address).
> >
> > This has caused weird boot failures in kmap_atomic much later in the boot
> > process (initial userspace faults) on a 32-bit PAE system with a larger number
> > of CPUs (smaller CPU counts tend not to run over into the next page so don't
> > show up the problem).
> atomic>
> > Solve this by attempting to clear out the page table, and copy any of its
> > entries to the new one. Also, add a bug if a nonlinear condition is encounted
> > and can't be resolved, which might save some hours of debugging if this fragile
> > scheme ever breaks again...
> >
> > Putting swapper_pg_fixmap into initdata is an exercise left for the reviewer...
>
> Ok. I see what is going on now. We have exceeded 512 fixmap entries, causing
> the fixmap entries to consume more than 2MB of the address space. Which broke
> the assumption that the fixmap entries are all contiguous.

Yes. That wasn't obvious from my problem description?


> Ditching the swapper_pg_fixmap has some problems.
>
> This appears to break early_printk to a usb debug port, which calls
> set_fixmap_nocache and expects the mapping to last.
>
> This looks like it will have problems with Xen and other environments
> where we come in with a pre-populated page table, possibly unmapping
> something important.

My patch copies the early fixmap mappings to the new page table. Isn't
this enough?

> one_page_table_init relies on alloc_bootmem_low_pages for it's memory allocation
> so we do not have a guarantee that we will have contiguous memory even without
> this.

It's OK, if fragile, in early boot where it uses alloc_low_page.


> I see three ways we can address this.
> - Grow swapper_pg_fixmap to cover the entire fixmap range.
> This trivially and without problems gives an atomic guarantee,
> and should allow removal of code that sets up the fixmaps later
> in C, except in weird cases like Xen.

Would be fine by me, although I want to get a minimal patch working
in the meantime if that is going to be complex.


> - Decide it is worth optimizing kmap_atomic_prot some more.
> Have a kmap_pte per cpu.
> Cache line align the kmap pte entries so we don't get conflicts
> per cpu, at which point we should be guaranteed the all 13 of
> them will be physically contiguous.

Go wild if you'd like to spend time optimising x86 PAE ;)


> - Not support more than 32 cpus on x86_32.

This is not an option really. Anyway, it's not more than 32 CPUs, but
can be a problem with as few as about 8 depending on config.


> I suspect it might even be worth writing a version of one_page_table_init
> that would guarantee discontiguous pages. So we can flush out these
> kinds of fragile assumptions.

So did you actually see anything wrong with my last patch which catches
discontiguous page tables and copies over the early fixmap translations?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/