Re: [ANNOUNCE] 2.6.31-rc4-rt1

From: Will Schmidt
Date: Thu Aug 13 2009 - 11:04:54 EST


On Thu, 2009-08-13 at 09:30 +0200, Thomas Gleixner wrote:
> Will,
>
> On Wed, 12 Aug 2009, Will Schmidt wrote:
> > The .find_legacy_serial_ports function in the backtrace is very very
> > early during boot. I'll do a bit more looking around to see if it's
> > anything obvious,..
>
> I almost expected some wreckage from converting a rwlock to a
> spinlock. Grr. That code really takes the lock recursive. None of my
> powerpc systems triggered that code path. :(
>
> So the reason for having a rwlock was not the concurrent access, it
> was just the lazy^W reuse of the existing functions which lead to
> recursive locking.
>
> As it's the only code in tree which would need the real rwlock version
> I fixed up the of code instead of creating all the extra rwlock
> functions for rt.
>
> Does the patch below fix it ?

Yes, and thanks for spinning that up for me.

With this patch, the reported problem is cleared up.. :-) Now I'm
getting as far as finding the other processors in the system before it
stops. :-(

Last output on the console is
"
...
clockevent: decrementer mult[83126e97] shift[32] cpu[3]
Processor 3 found.
clockevent: decrementer mult[83126e97] shift[32] cpu[4]
Processor 4 found.
"

Some other info is in the log_buf, I got this via xmon dump, and parsed
it by hand, try to ignore the extra spaces.. :-)

<7>xics: map virq 16, hwi rq 0x2.
<7>irq: i rq 2 on host nul l mapped to virt ual irq 16.
<7>xi cs: unmask virq 16.
<7> -> map to hwirq 0x2.
<6>Te sting tracer nop : PASSED.
<7>cloc kevent: decremen ter mult[83126e9 7] shift[32] cpu [1].
<4>Processor 1 found..
<7>clo ckevent: decreme nter mult[83126e 97] shift[32] cp u[2].
<4>Processo r 2 found..
<7>cl ockevent: decrem enter mult[83126 e97] shift[32] c pu[3].
<4>Process or 3 found..
<7>c lockevent: decre menter mult[8312 6e97] shift[32] cpu[4].
<4>Proces sor 4 found..
<4> .
<4>============ ================ =====.
<4>[ INFO: inconsistent lo ck state ].
<4>2. 6.31-rc4-rt1-wms .aug12 #3.
<4>--- ---------------- --------------.
< 4>inconsistent { HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage..
<4>irq/16 -IPI/3 [HC1[1]:S C0[0]:HE0:SE1] t akes:.
<4> (devtr ee_lock){?.+...} , at: [
<c0000000 006684f8>] .of_f ind_property+0x4 c/0x9c.
<4>{HARDI RQ-ON-W} state w as registered at :.
<4> [
<c000000 000119090>] .__l ock_acquire+0x69 8/0x900.
<4> [
<c 00000000011a3ec> ] .lock_acquire+ 0x144/0x194.
<4> [
<c0000000007ac 7d8>] ._atomic_s pin_lock+0x58/0x 84.
<4> [
<c00000 0000668dfc>] .of _find_node_by_na me+0x40/0x110.
<4 > [
<c0000000005 0d320>] .hvc_fin d_vtys+0x128/0x1 68.
<4> [
<c00000 0000b78c68>] .co nsole_init+0x50/ 0x7c.
<4> [
<c000 000000b34d70>] . start_kernel+0x3 40/0x59c.
<4> [
< c0000000000083d8 >] .start_here_c ommon+0x1c/0x44.
<4>irq event sta mp: 9532942.
<4>h ardirqs last en abled at (953294 1): [
<c000000000 008a1c>] restore +0x1c/0xec.
<4>ha rdirqs last disa bled at (9532942 ): [
<c0000000000 04808>] hardware _interrupt_entry +0x10/0x20.
<4>so ftirqs last ena bled at (0): [
<c 0000000000bb6e4> ] .copy_process+ 0x8dc/0x20c8.
<4> softirqs last di sabled at (0): [
<(null)>] (null) .
<4>.
<4>other in fo that might he lp us debug this :.
<4>1 lock held by irq/16-IPI/3 :.
<4> #0: (&rq- >lock){-.....}, at: [
<c000000000 09bc20>] .task_r q_lock+0x80/0x12 8.
<4>.
<4>stack b acktrace:.
<4>Cal l Trace:.
<4>[c00 000000fffaf40] [ c0000000000181ac ] .show_stack+0x 124/0x320
(unrel iable).
<4>[c0000 0000fffb010] [c0 0000000001a410] .dump_stack+0x28 /0x3c.
<4>[c00000 000fffb090] [c00 0000000114dc0] . print_usage_bug+
0x1dc/0x214.
<4>[ c00000000fffb150 ] [c000000000115 18c] .mark_lock+ 0x394/0x700.
<4>[ c00000000fffb210 ] [c000000000119 008] .__lock_acq uire
+0x610/0x900 .
<4>[c00000000ff fb310] [c0000000 0011a3ec] .lock_ acquire+0x144/0x 194.
<4>[c0000000 0fffb3e0] [c0000 000007ac7d8] ._a tomic_spin_lock+
0x58/0x84.
<4>[c0 0000000fffb470] [c0000000006684f 8] .of_find_prop erty+0x4c/0x9c.
< 4>[c00000000fffb 510] [c000000000 66858c] .of_get_ property+0x44/0x
78.
<4>[c00000000 fffb5b0] [c00000 000002f244] .rta s_token+0x6c/0xa 0.
<4>[c00000000f ffb640] [c000000 00007ffe8] .fwnm i_release_errinf o
+0x24/0x98.
<4>[ c00000000fffb6c0 ] [c000000000080 3f4] .pSeries_sy stem_reset_excep
tion+0x94/0xb0.
< 4>[c00000000fffb 750] [c000000000 03dae4] .system_ reset_exception+
0x7c/0x174.
<4>[c 00000000fffb7e0] [c0000000000034 04] system_reset _common+0x104/0x
180.
<4>--- Excep tion: 100 at ._r aw_spin_lock+0x8 0/0x2e8.
<4> L R = ._raw_spin_l ock+0x64/0x2e8.
< 4>[c00000000fffb b70] [c000000000 7ac7e4] ._atomic _spin_lock+0x64/
0x84.
<4>[c000000 00fffbc00] [c000 00000009bc20] .t ask_rq_lock+0x80 /0x128.
<4>[c0000 0000fffbcb0] [c0 000000000b33f4] .try_to_wake_up+ 0x114/0x6e0.
<4>[ c00000000fffbd90 ] [c0000000000b3 bac] .wake_up_pr ocess
+0x38/0x50.
<4>[c00000000fff be20] [c00000000 0155a70] .handle _IRQ_event
+0x3f4 /0x460.
<4>[c0000 0000fffbef0] [c0 0000000015a06c] .handle_percpu_i rq
+0xbc/0x184.
<4 >[c00000000fffbf 90] [c0000000000 40da8] .call_han dle_irq+0x1c/0x2
c.
<4>[c00000006a 3e7940] [c000000 000011cb4] .do_I RQ+0x20c/0x30c.
< 4>[c00000006a3e7 9f0] [c000000000 004814] hardware _interrupt_entry
+0x1c/0x20.
<4>-- - Exception: 501 at .raw_local_i rq_restore+0x10c /0x124.
<4> LR = .schedule+0x5 8/0xbc.
<4>[c0000 0006a3e7d70] [c0 000000001588dc] .irq_thread+0x41 c/0x4a4.
<4>[c000 00006a3e7e20] [c 0000000000f7844] .kthread+0xdc/0 xe8.
<4>[c0000000 6a3e7f90] [c0000 00000041010] .ke rnel_thread
+0x54 /0x70....

No additional spinlock error messages, I'll see what else I can figure
out and follow up with more details.


>
> Thanks,
>
> tglx
> -----
> commit 925df9b36dc9abe0bab32cbd2ac544d773da71ff
> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Thu Aug 13 09:04:10 2009 +0200


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/