Re: frequent softlockups with 3.10rc6.

From: Oleg Nesterov
Date: Mon Jun 24 2013 - 13:40:42 EST


On 06/24, Dave Jones wrote:
>
> On Sun, Jun 23, 2013 at 06:04:52PM +0200, Oleg Nesterov wrote:
> >
> > Could you please do the following:
> >
> > 1. # cd /sys/kernel/debug/tracing
> > # echo 0 >> options/function-trace
> > # echo preemptirqsoff >> current_tracer
> >
> > 2. reproduce the lockup again
> >
> > 3. show the result of
> > # cat trace
>
> Not sure this is helpful, but..

This makes me think that something is seriously broken.

Or I do not understand this stuff at all. Quite possible too.
Steven, could you please help?

> # preemptirqsoff latency trace v1.1.5 on 3.10.0-rc7+
> # --------------------------------------------------------------------
> # latency: 165015310 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)

OK, 165015310/1000000 = 165, nice.

> # -----------------
> # | task: trinity-child1-3173 (uid:1000 nice:19 policy:0 rt_prio:0)
> # -----------------
> # => started at: vprintk_emit
> # => ended at: vprintk_emit
> #
> #
> # _------=> CPU#
> # / _-----=> irqs-off
> # | / _----=> need-resched
> # || / _---=> hardirq/softirq
> # ||| / _--=> preempt-depth
> # |||| / delay
> # cmd pid ||||| time | caller
> # \ / ||||| \ | /
> trinity--3173 1dNh1 0us!: console_unlock <-vprintk_emit
> trinity--3173 1dNh1 165015310us : console_unlock <-vprintk_emit
> trinity--3173 1dNh1 165015311us+: stop_critical_timings <-vprintk_emit
> trinity--3173 1dNh1 165015315us : <stack trace>
> => console_unlock
> => vprintk_emit
> => printk
> => watchdog_timer_fn

But this is already called in the non-preemtible context, how can
'started at' blame vprintk_emit?

> => __run_hrtimer
> => hrtimer_interrupt
> => smp_apic_timer_interrupt
> => apic_timer_interrupt
> => _raw_spin_lock

This is where start_critical_timing() should be called?

Or by TRACE_IRQS_OFF in apic_timer_interrupt...

> => sync_inodes_sb
> => sync_inodes_one_sb
> => iterate_supers
> => sys_sync
> => tracesys

Also. watchdog_timer_fn() calls printk() only if it detects the
lockup, so I assume you hit another one?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/