Re: Debugging Thinkpad T430s occasional suspend failure.

From: Linus Torvalds
Date: Sat Feb 16 2013 - 14:47:37 EST


On Sat, Feb 16, 2013 at 11:25 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> Sorry for the delay in testing this, but there was a need to upgrade
> my laptop, and bozo here figured "why not go to 64 bits while I am at
> it?" -- and then proceeded to learn the hard way that it is necessary
> to do "make mrproper" before doing a build in 64-bit mode. :-/

Hmm. Our object file dependency check includes checking that the
compiler options are the same, but that's only true for normal C
files. Some of the other rules do *not* test the full range of config
options, so in general, if you change architecture etc models, you do
indeed want to make sure that you do a "make distclean" (aka "make
mrproper") or something like "git clean -dqfx".

For a number of other files, we just depend on the normal make
timestamp logic, which means that "if the object file is newer than
the sources", we'll trust it. Which obviously doesn't work for cases
where the object file may have been generated under totally different
architecture rules..

(That said, what kind of old environment did you do this in?
stub32_sigaltstack was removed during the merge window, so I'm
assuming you applied my patch on top of plain 3.7 or something?)

> The kernel build system's way of telling you this at the moment is:
>
> arch/x86/built-in.o:(.rodata+0x4990): undefined reference to `stub32_sigaltstack'

Adding Peter Anvin to the people, just in case he sees what's wrong
with the system call stub generation that keeps excessively old object
files around. If it's easy to fix, it might be worth trying to make it
ok to switch from i386 to x86-64 and back in the same tree.

Peter? Not a big deal, but if you see something obvious, let's just
try to fix it, ok?

> Anyway, with this patch, I see CPU stall warnings when running rcutorture
> as shown below. This is not a hard failure:

Yeah, there's something wrong with the patch, I didn't bother trying
to figure it out for now. It also causes a hard failure with lockdep
(or lock proving/debugging, I'm not sure which one triggered it) - and
it happens too early to even see anything on the screen.

So I'd like to make that "downgrade from hardirq to softirq" atomic,
and I think it would clean up the crazy code too (currently it does a
*lot* of back-and-forth on the preempt flags), but I clearly missed
some case where we used a wrapper or two to add some tracepoint or a
RCU scheduling point. And I'm not going to worry about it right now,
since I'm preparing to make v3.8 soon.

But if somebody spots the bug, holler.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/