Re: PROBLEM: Fatal Machine Check >= 3.13.5-101.fc19.x86_64

From: Borislav Petkov
Date: Fri Apr 18 2014 - 05:45:43 EST


On Fri, Apr 18, 2014 at 11:17:34AM +0200, Matthias Graf wrote:
> Fine-grained bisection result:
>
> ab70b1dde73ff4525c3cd51090c233482c50f217 is the first bad commit
> commit ab70b1dde73ff4525c3cd51090c233482c50f217
> Author: Alex Deucher <alexander.deucher@xxxxxxx>
> Date: Fri Nov 1 15:16:02 2013 -0400
>
> drm/radeon: enable DPM by default on r7xx asics
>
> Seems to be stable on them.
>
> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>
> :040000 040000 f3262029b868df4d882f64b4deba6b9230e307ea
> 1f1dfca42763703a56e3cc82bb103608a24be94e M drivers
>
>
> Result is reasonable: I have a RV770 chip.

Yes it is.

> (Additional) Bug Report for Reference:
> https://bugzilla.redhat.com/show_bug.cgi?id=1085785
>
> Thanks for the instructions Borislav! At first, I was not completely
> sure what you expected me to do (this is my first kernel bug report :)).

And you're doing good so far! :-)

> If there is anymore more I can help you with, let me know.

Ok, now we want to confirm that this patch is *actually* the culprit by
reverting it. Simply pull Linus' master branch to have the latest tree,
and then do:

$ git checkout -b radeon-revert master

so that you land on a throwaway branch where we can play. Then normally you
would do

$ git revert ab70b1dde73ff4525c3cd51090c233482c50f217

but that causes conflicts so I did it for you, see below. Simply apply
this patch ontop *without* doing the revert with git. Then build, boot
and test. We want to see whether it still generates those ROB timeout
machine checks. If all looks ok, then we're pretty sure we need to talk
about DPM with your GPU on your platform with Alex. :-)

Feel free to ask any questions should something be not clear.

Thanks.

---