Re: [patch] log fatal signals like SIGSEGV

From: Thomas Jarosch
Date: Mon Oct 06 2008 - 04:54:10 EST


Hello Mikael,

> > Log signals like SIGSEGV, SIGILL, SIGBUS or SIGFPE to aid tracing
> > of obscure problems. Also logs the sender of the signal.
>
> I believe the approach taken in this patch is broken:
>
> 1. The signal logging decision is taken before signal delivery,
> which causes *handled* signals in the above list to be logged.
> So your printk_ratelimit() can be swamped by handled signals
> causing it to not log unhandled fatal signals.
>
> Applications that handle SEGV/BUS/ILL/FPE aren't that uncommon.
>
> 2. Fatal signals are only interesting if they are self-generated.
> Signals sent from other processes or threads are uninteresting,
> if the purpose is to detect program errors or faulty hardware.

Thanks for your review. We already run the new patch on 500+ boxes and
didn't get any complaints about noisy messages in the logs (yet?).
Some of those boxes run "logcheck" and generate a daily report,
so there should be -something-.

I'm not sure if separating between kernel and process generated signals
makes much of a difference as there should be no log output anyway.
If you can provide me input that this will in fact generate noisy output,
I'll happily change the code.

> 3. Similar functionality already exists in the kernel, except
> it correctly runs much later in the signal delivery path.
> Grep for print_fatal_signals and show_unhandled_signals.

print_fatal_signals is debug-only, see the mails
from the first review phase about that.

show_unhandled_signals seems to be implemented on x86 and PPC only.

Concerning x86: Both pieces of code are integrated in
arch/x86/kernel/traps_32.c: do_general_proection().

Does this code path also get called for SIGABRT or SIGFPE?

> There's also some trace hooks in the signal delivery path
> that look like they could log actual fatal signals.

Do you have a particular one in mind?

[Jiri Kosina wrote]
> BTW be aware that for example x86 arch-specific code does this on its own,
> and therefore with your patch, the information will be duplicated. See
> page fault handler for x86.

Yes, I like that. The new code is architecture-independant,
perhaps the architecture-dependant code could even be obsoleted
and all platforms would benefit from the new logging.

Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/