Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMDGeode LX800

From: H. Peter Anvin
Date: Wed Sep 26 2007 - 17:04:50 EST


Jordan Crouse wrote:
> On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
>> Please try the following debug patch to let us know what is going on.
>>
>> -hpa
>
>> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
>> index 1a2e62d..a0ccf29 100644
>> --- a/arch/i386/boot/memory.c
>> +++ b/arch/i386/boot/memory.c
>> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
>> "=m" (*desc)
>> : "D" (desc), "a" (0xe820));
>>
>> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
>> + err, id, next,
>> + (unsigned int)desc->addr,
>> + (unsigned int)desc->size,
>> + desc->type);
>> +
>> if (err || id != SMAP)
>> break;
>
> Okay, we have clarity. Here is the output
>
> e820: err 0 id 0x534d4150 next 15476 00000000:0009fc00 1
> e820: err 0 id 0x534d4150 next 15496 0009fc00:00000400 2
> e820: err 0 id 0x534d4150 next 15516 000e0000:00020000 2
> e820: err 0 id 0x0e7b0000 next 11536 00100000:0e6b0000 1
>
> In the last entry, id is obviously wrong (it should be 'SMAP' or
> 0x534d4150). This is the BIOS bug.
>
> Here's the reason why this bothers us now. In the old assembly code,
> if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> code. In the new code in memory.c, if id != SMAP, we break out of the
> int15 loop, and return boot_params.e820_entries, which in our case is
> 3. detect_memory() considers this to be successful, and no attempt to
> parse e801 is made.
>
> So thats where the problem is - in the old code with the buggy BIOS, we
> punted to reading the e801 information, and that was enough to keep us
> going. In the new code, we allow a partial table to be used, and we
> blow up.
>
> Attached is a patch to fix this - it returns -1 on error, and only sets
> boot_params.e820_entries to be non-zero if we have something useful
> in it. This punts the detection to the e801 code, which then is
> then successful.
>
> This fixes the problem with the DB800, and so it probably should
> with the other Geode platforms affected by this.
>
> Many thanks to hpa for the guiding hand.
>

This patch is obviously wrong. There are a lot of e820 BIOSen out there
that terminate with CF=1, and that is a legitimate termination condition
for e820. Now, as far as what to do when id != SMAP, it probably is
still the right thing to do; since the BOS vendor couldn't get something
that elementary correct, we shouldn't trust the data.

I'll write up a corrected patch.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/