Re: [git pull] drm merge for 3.9-rc1

From: Josh Boyer
Date: Thu Feb 28 2013 - 10:15:43 EST


On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer <jwboyer@xxxxxxxxx> wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
>>>>>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>>>>
>>>>>> So I don't think that's actually the cause of the problem. Or at least
>>>>>> not that alone. I reverted it on top of Linus' latest tree and I still
>>>>>> get the lockups.
>>>>>
>>>>> Actually, git bisect does seem to have gotten it correct. Once I
>>>>> actually tested the revert of just that on top of Linus' tree (commit
>>>>> d895cb1af1), things seem to be working much better. I've rebooted a
>>>>> dozen times without a lockup. The most I've seen it take on a kernel
>>>>> with that commit included is 3 reboots, so that's definitely at least an
>>>>> improvement.
>>>>
>>>> I give up. GPU issues are not my thing. 2 reboots after I sent that it
>>>> gave me pretty rainbow static again. So it might have been an
>>>> improvement, but revert it is not a solution.
>>>>
>>>> Looking at there rest of the commits, the whole GPU rework might be
>>>> suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code. I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card. It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great. If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached. The first attempts a full reset of all
> blocks if the MC (memory controller) is hung. That may work better
> than just resetting the MC. The second just disables MC reset. I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK. I'll test them individually. It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/