Re: [PATCH] panic: fix incomplete panic log in panic()

From: Qing Z
Date: Tue Oct 16 2012 - 10:25:14 EST


2012/10/16 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>:
> On Mon, 15 Oct 2012 19:38:46 +0800
> Qing Z <njumical@xxxxxxxxx> wrote:
>
>> >> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> + /*
>> >> + * Unlock the console anyway here, in case it's occupied by another
>> >> + * one which has no chance to unlock the console thus prevents the
>> >> + * panic log prints on the console.
>> >> + */
>> >> + console_unlock();
>> >> +
>> >> bust_spinlocks(0);
>> >>
>> >> if (!panic_blink)
>> >
>> > hm. console_unlock() does a large amount of work, and it seems risky
>> > to do all of that when the system is in a bad-and-getting-worse state.
>> >
>> > Is there some more modest thing we could do here, for example,
>> >
>> > if (console_locked) {
>> > up(&console_sem);
>> > console_locked = 0;
>> > }
>> >
>> > or something along those lines?
>> >
>> > Also, perhaps this operation should be moved into bust_spinlocks().
>> > What would have happened if your code had triggered an oops, rather
>> > than called panic()?
>> >
>> >
>> Hi Andrew,
>> Thanks for your reply!
>> For your question" What would have happened if your code had
>> triggered an oops, rather than called panic()?", actually we found the
>> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
>> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
>> oops before unlock console, the issue happen. It also exist when call
>> panic() directly in the same case. It is a common issue for panic
>> process.
>> I have two options for solution:
>> 1. I agree with your suggestion that add some modest thing in
>> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
>> which prevent oops information from reaching the user. But it didn't
>> clear console_sem. We can add codes that clear console_sem.
>> 1) add up(&console_sem) in bust_spinlocks(0).
>> It will be risky in case that no printk after bust_spinlocks(0) in
>> panic(), because no console_unlock() to print log out.
>> 2) call console_unlock()in bust_spinlocks(0).
>> For bust_spinlocks(0), console_unblank() is used to flush oops to
>> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
>> Logically, if panic without the issue, console_unlock is called after
>> couples of console_lock and console_unlock; if panic with the issue,
>> will it be risky call console_unlock() in console_unblank() after
>> console_lock()?
>> 2. Moreover, there is another option. We can also add protect codes
>> in vprintk(), vprintk() just cover the cases that two cores' log
>> interleave when panic and printk recurse itself. We can add all cases'
>> protection here. Actually the original vprintk() don___t have the issue,
>> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
>> which fix two cores' log interleave issue , the issue is not covered.
>> I add a flag after panic_smp_self_stop() in panic(), and check the
>> flag, if flag is set, vprintk will call zap_locks(), I have tested the
>> option, the issue also disappear.
>> What do you think?
>
> The #1 priority is to get the oops message reliably delivered.
>
> That means we should avoid console_unlock() on the oops path: it's far
> too complicated and risks deadlocks, re-oopses, recursion, etc.
>
> If there was text queued in the console layer and that text fails to be
> emitted, well, that's sad, but it's more important that the oops
> message be displayed.
>
> If the oops trace is occasionally interleaved with other text then
> that's sad too, but at least the info we need is readable. Oopses
> inside console_lock() are rare.
>
>
> So I'd suggest that the code in bust_spinlocks(1) should simply do
> whatever needs to be done to make the forthcoming oops trace be
> visible, and leave it at that - don't bother trying to flush out any
> old text.
>
>
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed. Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess. (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example. But you see the problem).
>
Hi Andrew,
It is indeed risky that call console_unlock() directly in
panic process, like deadlock that core0 stop core1 but core1 is
holding logbuf_lock.
As to old text, if we hold logbuf_lock when panic,we can keep
panic log continuous and entire, old text will be before panic
log.Frankly, I don't see any risk that old text impact or interleave
panic log.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/