Re: help with git bisecting a bug 16376: random - possibly Radeon DRM KMS related - freezes

From: Martin Steigerwald
Date: Thu Sep 09 2010 - 10:19:03 EST


Am Dienstag 07 September 2010 schrieb Ted Ts'o:
> On Sun, Sep 05, 2010 at 09:53:41AM +0200, Martin Steigerwald wrote:
> > Quite some kernels were unbootable with an ext4 and readahead related
> > backtrace[1].
>
> Unfortunately, you don't have a full backtrace in the picture which
> submitted as an attachment to the bugzilla. It shows part of the
> backtrace which has an ext4 and readahead stack, yes. But we didn't
> get to see the beginning of the stack trace with the IP and the reason
> for the oops. If keyboard interrupts still work, you might try seing
> if you can scroll upwards and see more of the backtrace. Or you might
> try configuring your console to use a higher resolution display so
> more lines can be displayed. Or you might try getting a serial
> console.

Thanks for your detailled analysis. I missed posting an update to the
thread. I did not have to go back to those kernels again and bisected the
issue down to about 10 revisions, when Alex suggested my bug might be a
duplicate of

[Bug 28402] random radeon/kms/drm related freezes with kernel 2.6.34
https://bugs.freedesktop.org/show_bug.cgi?id=28402

So I tried some patches in there and the vmembase at zero patch seems to
do the trick. Although I am not sure, whether its a solution or a work-
around.

> I don't recognize the display, but the problem could just as easily be
> in the block layer or in the device driver for your hard drive.
> (i.e., the readahead stack calls ext4, which in turn will submit a
> read request to the block device layer which then submits the request
> to a device driver).

Yes, I am aware that it may not be a Ext4 problem at all. Thus I said Ext4
/ readahead related (!) backtrace (! not bug) cause that was all I could
see on the screen. How else should I have described that backtrace when I
can't speculate on what I can not see?

> But because you keep referring it to it as an ext4/readahead related
> backtrace, you may have disguised the symptom enough that people who
> might recognize it as, "Oh, yeah, there was this regression in the
> SATA layer", wouldn't recognize it as such from your description.
> That's why it's important to be careful how you describe issues; if
> you had said, I don't have a complete stack trace, and I don't have
> the IP and function where the fault occurred, that might have caused
> people to think a bit harder about what might be the problem, instead
> of thinking to themselves, "ah, well, the ext4 and readahead parts of
> the kernel aren't my problem, so I'll ignore this report".

I thought thats what the provided backtrace is for. And I think that any
developer can see that it isn't complete.

I will include a note that the backtrace is incomplete next time
nevertheless.

It would be good to have a backtrace viewer and saver that still works in
those conditions ;-). And when it just writes it somewhere on the swap
partition were a tool can grab it after booting again. But when the kernel
is completely messed up, exactly that can be very dangerous.

> > I am also seeking help with selecting more suitable commits to test:
> > If its a Radeon KMS related freeze and everything points at it, I
> > think the offending commit is in the first quarter of what git
> > commit shows to me[2].
>
> You do know that you can restrict a git bisect to commits that modify
> a particular part of the tree, right? e.g.,
>
> git bisect start 2.6.34 2.6.33 -- drivers/gpu/drm/radeon

Yes, I have seen that in the git manpage, but since I wasn't absolutely
sure, that the freeze is radeon kms/drm related I skipped that step. From
what I learned I should have looked at git bisect visualize earlier and
selected from commits prior and after that drm kms related merges. That
would have spared me quite some time when my suspicion was right, like it
turned out to be, and wouldn't have taken many more turn arounds when it
was wrong.

Next time I know this.

Thanks for your help, I appreciate it.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

Attachment: signature.asc
Description: This is a digitally signed message part.