Re: Mysterious CFQ crash and RCU

From: Paul Bolle
Date: Sat Jun 04 2011 - 08:26:32 EST


On Fri, 2011-06-03 at 09:45 -0400, Vivek Goyal wrote:
> PaulB mentioned that crash happened at May 26 10:47:07. I am wondering
> how are we able to sample the data after the crash. I am assuming
> that above data gives information only before crash and does not
> tell us anything about what happened just before crash. What am I missing.

Well, what you called a "CFQ crash" is an Oops (apparently generated by
arch/x86/mm/fault.c:show_fault_oops()). But the traces I posted at the
bugzilla.redhat.com issue for this always end with: "Fixing recursive
fault but reboot is needed" (see kernel/exit.c:do_exit()). At that point
the system is still running.

Perhaps you run with panic_on_oops on by default (rumor has it that's an
RHEL default) which might make the result of this Oops surprising.
Anyhow, it turns out that my system is suspiciously happy after the
process(es) causing this Oops has (have) finished. See the big friendly
warning I put on top of the message in which I pasted the output of
Paul's script:

> 1) Big friendly warning: the "CFQ crash" that occurred while running
> your script didn't happen in a clean session. Not at all! It actually
> happened after (summarized a bit):
> - two "CFQ crashes" with the patch for Jens' first idea;
> - switching to deadline
> - removing cfq_iosched
> - recompiling cfq-iosched.ko (to revert Jens' patch)
> - installing cfq_iosched.ko
> - inserting cfq_iosched
> - switching back to cfq again

(Yes, putting "CFQ crash" in quotes there was a bit of legalese on my
part.)


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/