Re: 2.6.21-rc4-mm1

From: Andy Whitcroft
Date: Thu Mar 22 2007 - 14:18:29 EST


Andy Whitcroft wrote:
> Con Kolivas wrote:
>> On Thursday 22 March 2007 20:48, Andy Whitcroft wrote:
>>> Andy Whitcroft wrote:
>>>> Andy Whitcroft wrote:
>>>>> Andrew Morton wrote:
>>>>>> Temporarily at
>>>>>>
>>>>>> http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
>>>>>>
>>>>>> Will appear later at
>>>>>>
>>>>>>
>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc
>>>>>> 4/2.6.21-rc4-mm1/
>>>>> [All of the below is from the pre hot-fix runs. The very few results
>>>>> which are in for the hot-fix runs seem worse if anything. :( All
>>>>> results should be out on TKO.]
>>>>>
>>>>>> - Restored the RSDL CPU scheduler (a new version thereof)
>>>>> Unsure if the above is the culprit but there seems to be a smattering of
>>>>> BUG's in kernbench from the schedular on several systems, and panics
>>>>> which do not fully dump out.
>>>>>
>>>>> elm3b239 is about 2/4 kernbench being the test in progress when we
>>>>> blammo in both failed tests, elm3b234 doesn't boot at all.
>>>> Well I have one result through for backing RSDL out on elm3b239 and that
>>>> does indeed seem to give us a successful boot and test. peterz has
>>>> pointed me to an incremental patch from Con which I'll push through
>>>> testing and see if that sorts it out.
>>> Ok, tested the patch below on top of 2.6.21-rc4-mm1 and this seems to
>>> fix the problem:
>>>
>>> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1-rsdl-0.32.p
>>> atch
>>>
>>> Hard to tell from that patch whether it will be fixed in the changes
>>> already committed to the next -mm.
>>>
>>> Its possible that it may be fixed by the following patch:
>>>
>>> sched-rsdl-improvements.patch
>>>
>>> Which has the following slipped in at the end of the changelog:
>>>
>>> A tiny change checking for MAX_PRIO in normal_prio()
>>> may prevent oopses on bootup on large SMP due to
>>> forking off the idle task.
>>>
>>> Con, are all the changes in the 0.32 patch above with akpm?
>> Yes he's queued everything in that patch you tested for the next -mm. Thanks
>> very much for testing it.
>
> No worries. I've just got through the results on the other machine in
> the mix. That machine seems to be fixed by backing out RSDL and not by
> the fixup 0.32 patch ...
>
> This second machine seems to had hard very soon after user space starts
> executing but without a panic. I can't say that the symptoms are very
> definitive, but I do have a good result from that machine without RSDL
> and not with rsdl-0.32.
>
> The machine is a dual-core x86_64 machine: Dual Core AMD Opteron(tm)
> Processor 275.
>
> I'll let you know if I find out anything else. Shout if you want any
> information or have anything you want poked or tested.

Ok, I have yet a third x86_64 machine is is blowing up with the latest
2.6.21-rc4-mm1+hotfixes+rsdl-0.32 but working with
2.6.21-rc4-mm1+hotfixes-RSDL. I have results on various hotfix levels
so I have just fired off a set of tests across the affected machines on
that latest hotfix stack plus the RSDL backout and the results should be
in in the next hour or two.

I think there is a strong correlation between RSDL and these hangs. Any
suggestions as to the next step.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/