Re: [PATCH 3/3] printk: Avoid softlockups in console_unlock()

From: Jan Kara
Date: Mon Feb 18 2013 - 11:31:46 EST


On Fri 15-02-13 14:22:19, Andrew Morton wrote:
> On Fri, 15 Feb 2013 17:57:10 +0100
> Jan Kara <jack@xxxxxxx> wrote:
>
> > A CPU can be caught in console_unlock() for a long time (tens of seconds are
> > reported by our customers) when other CPUs are using printk heavily and serial
> > console makes printing slow. Despite serial console drivers are calling
> > touch_nmi_watchdog() this triggers softlockup warnings because
> > interrupts are disabled for the whole time console_unlock() runs (e.g.
> > vprintk() calls console_unlock() with interrupts disabled). Thus IPIs
> > cannot be processed and other CPUs get stuck spinning in calls like
> > smp_call_function_many(). Also RCU eventually starts reporting lockups.
> >
> > In my artifical testing I also managed to trigger a situation when disk
> > disappeared from the system apparently because commands to / from it
> > could not be delivered for long enough. This is why just silencing
> > watchdogs isn't a reliable solution to the problem and we simply have to
> > avoid spending too long in console_unlock().
> >
> > We fix the issue by limiting the time we spend in console_unlock() to
> > watchdog_thresh() / 4 (unless we are in an early boot stage or oops is
> > happening). The rest of the buffer will be printed either by further
> > callers to printk() or during next timer tick.
> >
>
> It still gives me tummy ache :(
But it's better than it used to be, isn't it? At least I like this
version more than the one with postponing to worker thread since we only
depend on timer ticks to occur...

> The patch adds additional tests of oops_in_progress. Some description
> of your thinking on that matter would be appropriate?
Good point, I'll add that. My thinking was that when we are oopsing, all
bets are off and we want to get the messages to console as reliably as
possible and we don't care about soflockups anymore as we have bigger
trouble anyway.

> > --- a/kernel/printk.c
> > +++ b/kernel/printk.c
> > @@ -1990,17 +1990,31 @@ int is_console_locked(void)
> > #define PRINTK_PENDING_OUTPUT 2
> >
> > static unsigned long printk_pending;
> > +static int last_printing_cpu = -1;
> > +
> > +static bool __console_unlock(void);
> >
> > void printk_tick(void)
>
> printk_tick() no longer exists in linux-next.
Thanks for notice, I'll rebase and fix this up.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/