Re: Debugging Thinkpad T430s occasional suspend failure.

From: Linus Torvalds
Date: Sat Feb 16 2013 - 18:02:36 EST


On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> yesterday, to save an oopsing core kernel ip there, instead of hashed
> pm trace info (it makes sense in this case to invert your sequence,
> putting the high order into years and the low order into minutes).

That sounds like a good idea in general. The PM_TRACE() thing was done
to figure out things that locked up the PCI bus etc, but encoding the
oopses during suspend sounds like a really good idea too.

Is your patch clean enough to just be made part of the standard
PM_TRACE infrastructure, or was it something really hacky and one-off?

> Rewarded last night by reboot to Feb 21 14:45:53 2006. Which is
> ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
>
> /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
> * enable dithering as needed, but that costs bandwidth. So choose
> * the minimum value that expresses the full color range of the fb but
> * also stays within the max display bpc discovered above.
> */
>
> switch (fb->depth) {
> ffffffff812d60e9: 48 8b 55 c0 mov -0x40(%rbp),%rdx
> ffffffff812d60ed: 8b 02 mov (%rdx),%eax
>
> (gcc chose to pass a pointer to fb->depth down to the function,
> instead of fb itself, since that is the only use of it there.)
>
> I expect that fb is NULL; but with an average of one failure to resume
> per day, and ~26 bits of info per crash, this is not a fast procedure!
>
> I notice that intel_pipe_set_base() allows for NULL fb,
> so I'm currently running with the oops-in-rtc hackery, plus
> - switch (fb->depth) {
> + if (WARN_ON(!fb))
> + bpc = 8;
> + else switch (fb->depth) {
>
> There's been a fair bit of change to intel_display.c since 3.7 (if
> my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> ironlake_, but I've not yet spotted anything obvious; nor yet looked
> to see where fb would originate from anyway.
>
> Once I've got just a little more info out of it, I'll start another
> thread addressed principally to the drm/gpu/i915 guys.

I think it's worth it to give them a heads-up already. So I've cc'd
the main suspects here..

Daniel, Dave - any comments about a NULL fb in
intel_choose_pipe_bpp_dither() during either suspend or resume? Some
googling shows this:

https://bugzilla.redhat.com/show_bug.cgi?id=895123

which sounds remarkably similar, and is also during a suspend attempt
(but apparently Satish got a full oops out).. Some timing race with a
worker entry?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/