rcu_sched_state detected stall (serial 8250 wait_for_xmitr)

From: joe shmoe
Date: Thu May 10 2012 - 22:21:28 EST


Hi,

I am relatively a newbie in kernel matters, so please be gentle. I am trying to learn things.

I am using non-tained 2.6.35.14 kernel and hitting a CPU stall that is reproducible whenever I send a lot of data through the serial console (klogd -2 -c 7).

I've tried looking through the changelog and don't find any patch that directly addresses this (it is my understanding

that RCU has changed quite a bit since 2.6.35.14 though).

Could someone please suggest how to go about debugging this further, or what more information I can provide for the community to understand what the potential issue here might be?

I referenced stallwarn.txt and turned on CONFIG_PROVE_RCU, CONFIG_TRACE_RCU and related config flags, recompiled and reproduced the issue. The stack trace seems to indicate the trouble is in this loop which loops for only 10ms:

http://lxr.linux.no/linux+*/drivers/serial/8250.c#L1860

[  481.553694]
[  481.553697] Pid: 142, comm: rcu_torture_rea Not tainted 2.6.35-14EIsmp g6os X8DTH/X8DTH-i/6/iF/6F
[  481.553699] RIP: 0010:[<ffffffff812146a7>]  [<ffffffff812146a7>] delay_tsc+0x2f/0x4f
[  481.553704] RSP: 0018:ffff880002c43a88  EFLAGS: 00000087
[  481.553706] RAX: 0000000000000128 RBX: ffffffff82a5c8c0 RCX: 0000000000000002
[  481.553708] RDX: 0000000000000184 RSI: 0000000000000002 RDI: 00000000000009e6
[  481.553709] RBP: ffff880002c43a90 R08: 00000000ad22a044 R09: ffffffff81293821
[  481.553711] R10: ffffffff81db0754 R11: 0000000000000000 R12: 0000000000000000
[  481.553713] R13: 0000000000002709 R14: 0000000000000020 R15: 000000000000002e
[  481.553715] FS:  0000000000000000(0000) GS:ffff880002c40000(0000) knlGS:0000000000000000
[  481.553717] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  481.553718] CR2: 00007f9473337830 CR3: 000000022a734000 CR4: 00000000000006e0
[  481.553720] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  481.553722] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  481.553724] Process rcu_torture_rea (pid: 142, threadinfo ffff88013eae6000, task ffff88013eabf000)
[  481.553725] Stack:
[  481.553726]  0000000000000000 ffff880002c43aa0 ffffffff81214635 ffff880002c43ab0
[  481.553728] <0> ffffffff81214676 ffff880002c43ae0 ffffffff81291b7b ffffffff82a5c8c0
[  481.553731] <0> 000000000000000a ffffffff81291bc6 ffffffff82a5c8c0 ffff880002c43b00
[  481.553734] Call Trace:
[  481.553735]  <IRQ>
[  481.553737]  [<ffffffff81214635>] __delay+0xa/0xc
[  481.553739]  [<ffffffff81214676>] __const_udelay+0x3f/0x41
[  481.553743]  [<ffffffff81291b7b>] wait_for_xmitr+0x56/0xa1
[  481.553745]  [<ffffffff81291bc6>] ? serial8250_console_putchar+0x0/0x34
[  481.553748]  [<ffffffff81291be5>] serial8250_console_putchar+0x1f/0x34
[  481.553750]  [<ffffffff8128e78b>] uart_console_write+0x41/0x52
[  481.553753]  [<ffffffff8129387c>] serial8250_console_write+0xb2/0x116
[  481.553755]  [<ffffffff8103e4dd>] __call_console_drivers+0x67/0x79
[  481.553758]  [<ffffffff8103e54a>] _call_console_drivers+0x5b/0x5f
[  481.553760]  [<ffffffff8103e937>] release_console_sem+0x12a/0x1c5
[  481.553762]  [<ffffffff8103efaf>] vprintk+0x373/0x3a3
[  481.553765]  [<ffffffff8103f046>] ? printk+0x67/0x69
[  481.553767]  [<ffffffff8103f046>] printk+0x67/0x69
[  481.553771]  [<ffffffff8106597e>] ? trace_hardirqs_on+0xd/0xf
[  481.553790]  [<ffffffffa012e2d2>] ? debug_print_prefix+0x135/0x146 [scst]
[  481.553802]  [<ffffffffa010b6a3>] scst_cmd_done_local+0xc3/0x280 [scst]
[  481.553807]  [<ffffffffa0199d61>] blockio_endio+0xcf/0xee [scst_vdisk]

This gdb is from an earlier run (but same loops is responsible all the time).

(gdb) list *(wait_for_xmitr+0x56)
0x5ae is in wait_for_xmitr (drivers/serial/8250.c:1863).
1858    {
1859            unsigned int status, tmout = 10000;
1860
1861            /* Wait up to 10ms for the character(s) to be sent. */
1862            do {
1863                    status = serial_in(up, UART_LSR);
1864
1865                    up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;
1866
1867                    if (--tmout == 0)
(gdb)Thank you for any help.
Regards.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/