Re: Fwd: IGMP and rwlock: Dead ocurred again on TILEPro

From: Cypher Wu
Date: Thu Feb 17 2011 - 00:04:22 EST


On Thu, Feb 17, 2011 at 12:49 PM, Américo Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> On Thu, Feb 17, 2011 at 11:39:22AM +0800, Cypher Wu wrote:
>>---------- Forwarded message ----------
>>From: Cypher Wu <cypher.w@xxxxxxxxx>
>>Date: Wed, Feb 16, 2011 at 5:58 PM
>>Subject: GMP and rwlock: Dead ocurred again on TILEPro
>>To: linux-kernel@xxxxxxxxxxxxxxx
>>
>>
>>The rwlock and spinlock of TILEPro platform use TNS instruction to
>>test the value of lock, but if interrupt is not masked, read_lock()
>>have another chance to deadlock while read_lock() called in bh of
>>interrupt.
>>
>
>
> In this case, you should call read_lock_bh() instead of read_lock().
>
>
>> frame 0: 0xfd3bfbe0 dump_stack+0x0/0x20 (sp 0xe4b5f9d8)
>> frame 1: 0xfd3c0b50 __raw_read_lock_slow.cold+0x50/0x90 (sp 0xe4b5f9d8)
>> frame 2: 0xfd184a58 igmpv3_send_cr+0x60/0x440 (sp 0xe4b5f9f0)
>> frame 3: 0xfd3bd928 igmp_ifc_timer_expire+0x30/0x90 (sp 0xe4b5fa20)
>> frame 4: 0xfd047698 run_timer_softirq+0x258/0x3c8 (sp 0xe4b5fa30)
>> frame 5: 0xfd0563f8 __do_softirq+0x138/0x220 (sp 0xe4b5fa70)
>> frame 6: 0xfd097d48 do_softirq+0x88/0x110 (sp 0xe4b5fa98)
>> frame 7: 0xfd1871f8 irq_exit+0xf8/0x120 (sp 0xe4b5faa8)
>> frame 8: 0xfd1afda0 do_timer_interrupt+0xa0/0xf8 (sp 0xe4b5fab0)
>> frame 9: 0xfd187b98 handle_interrupt+0x2d8/0x2e0 (sp 0xe4b5fac0)
>> <interrupt 25 while in kernel mode>
>> frame 10: 0xfd0241c8 _read_lock+0x8/0x40 (sp 0xe4b5fc38)
>> frame 11: 0xfd1bb008 ip_mc_del_src+0xc8/0x378 (sp 0xe4b5fc40)
>> frame 12: 0xfd2681e8 ip_mc_leave_group+0xf8/0x1e0 (sp 0xe4b5fc70)
>> frame 13: 0xfd0a4d70 do_ip_setsockopt+0xe48/0x1560 (sp 0xe4b5fc90)
>> frame 14: 0xfd2b4168 sys_setsockopt+0x150/0x170 (sp 0xe4b5fe98)
>> frame 15: 0xfd14e550 handle_syscall+0x2d0/0x320 (sp 0xe4b5fec0)
>> <syscall while in user mode>
>> frame 16: 0x3342a0 (sp 0xbfddfb00)
>> frame 17: 0x16130 (sp 0xbfddfb08)
>> frame 18: 0x16640 (sp 0xbfddfb38)
>> frame 19: 0x16ee8 (sp 0xbfddfc58)
>> frame 20: 0x345a08 (sp 0xbfddfc90)
>> frame 21: 0x10218 (sp 0xbfddfe48)
>>Stack dump complete
>>
>>I don't know the clear definition of rwlock & spinlock in Linux, but
>>the implementation of other platforms
>>like x86, PowerPC, ARM don't have that issue. The use of TNS cause a
>>race condition between system
>>call and interrupt.
>>
>
> Have you turned CONFIG_LOCKDEP on?
>
> I think Eric already converted that rwlock into RCU lock, thus
> this problem should disappear. Could you try a new kernel?
>
> Thanks.
>

I haven't turned CONFIG_LOCKDEP on for test since I didn't get too
much information when we tried to figured out the former deadlock.

IGMP used read_lock() instead of read_lock_bh() since usually
read_lock() can be called recursively, and today I've read the
implementation of MIPS, it's should also works fine in that situation.
The implementation of TILEPro cause problem since after it use TNS set
the lock-val to 1 and hold the original value and before it re-set
lock-val a new value, it a race condition window.

It's not practical to upgrade the kernel.

Thanks.

--
Cyberman Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/