Re: kexec boot regression

From: Jens Axboe
Date: Tue Dec 15 2009 - 16:40:26 EST


On Tue, Dec 15 2009, Jens Axboe wrote:
> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > Jens Axboe wrote:
> > > On Tue, Dec 15 2009, Jens Axboe wrote:
> > >> On Tue, Dec 15 2009, Jens Axboe wrote:
> > >>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>> Jens Axboe wrote:
> > >>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>> Jens Axboe wrote:
> > >>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>> Jens Axboe wrote:
> > >>>>>>>>> On Tue, Dec 15 2009, Yinghai Lu wrote:
> > >>>>>>>>>> [PATCH] x86/pci: intel ioh bus num reg accessing fix
> > >>>>>>>>>>
> > >>>>>>>>>> it is above 0x100, so if mmconf is not enable, need to skip it
> > >>>>>>>>> This works, it kexecs kernels fine. But since 2.6.32 doesn't have the
> > >>>>>>>>> mmconf problem to begin with, are we now just working around the issue?
> > >>>>>>>>> SRAT still reports issues, numa doesn't work.
> > >>>>>>>> that patch will be bullet proof... we need it.
> > >>>>>>>>
> > >>>>>>>> also still need to figure out why memmap range is not passed properly.
> > >>>>>>>>
> > >>>>>>>> do you mean 2.6.32 kexec 2.6.32 it have worked mmconf and numa in
> > >>>>>>>> second kernel?
> > >>>>>>> Yes, 2.6.32 booted and 2.6.32 kexec'ed works just fine, no SRAT
> > >>>>>>> complaints and NUMA works fine.
> > >>>>>> do you need
> > >>>>>> memmap=62G@4G
> > >>>>>> in this case?
> > >>>>> Yes, I've needed that always.
> > >>>> good,
> > >>>>
> > >>>> can you enable debug option in kexec to see why kexec can not pass
> > >>>> whole 38? range to second kernel?
> > >>> Not getting any output so far, -d doesn't do much. Poking around in the
> > >>> source...
> > >> OK, cold boot and kexec 2.0.1 gets all 39 ranges passed properly to
> > >> kexec'ed kernels. Since the older kexec stopped at range 30 (31 ranges
> > >> total), that smells like just a kexec bug. Retesting -git...
> > >
> > > Current -git works fine when all the ranges are passed correctly. So, I
> > > think, the only existing regression is the SRAT issue.
> >
> > did you change node_shift?
>
> Yes:
>
> CONFIG_NODES_SHIFT=6
>
> What I don't get is that 2.6.32 and -git print the same PXM map, and in
> both cases it's totalling exactly 64G. Yet it says:
>
> SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.

Clue:

[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-480000000
[ 0.000000] SRAT: Node 2 PXM 1 480000000-880000000
[ 0.000000] SRAT: Node 1 PXM 2 880000000-c80000000
[ 0.000000] SRAT: Node 3 PXM 3 c80000000-1080000000
[ 0.000000] NUMA: Using 31 for the hash shift.
[ 0.000000] pxm0: 0-480000 (4718592), absent 553990
[ 0.000000] pxm1: 880000-c80000 (4194304), absent 0
[ 0.000000] pxm2: 480000-880000 (4194304), absent 4194304
[ 0.000000] pxm3: c80000-1080000 (4194304), absent 0
[ 0.000000] SRAT: PXMs only cover 49035MB of your 65419MB e820 RAM. Not used.
[ 0.000000] SRAT: SRAT not used.

It's essentially disregarding pxm2, claiming all pages are absent.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/