Re: netpoll trapped question

From: Jeff Moyer
Date: Tue Sep 07 2004 - 12:05:06 EST


==> Regarding Re: netpoll trapped question; Matt Mackall <mpm@xxxxxxxxxxx> adds:

mpm> On Tue, Sep 07, 2004 at 12:33:49PM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netpoll trapped question; Matt Mackall
>> <mpm@xxxxxxxxxxx> adds:
>>
mpm> On Tue, Aug 31, 2004 at 01:10:43PM -0400, Jeff Moyer wrote:
>> >> Hi, Matt,
>> >>
>> >> This part of the netpoll trapped logic seems suspect to me, from >>
>> include/linux/netdevice.h:
>> >>
>> >> static inline void netif_wake_queue(struct net_device *dev) { #ifdef
>> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif if >>
>> (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state)) >>
>> __netif_schedule(dev); }
>> >>
>> >> static inline void netif_stop_queue(struct net_device *dev) { #ifdef
>> >> CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif >>
>> set_bit(__LINK_STATE_XOFF, &dev->state); }
>> >>
>> >> This looks buggy. Network drivers are now not able to stop the queue
>> >> when they run out of Tx descriptors. I think the __netif_schedule is
>> >> okay to do in the context of netpoll, and certainly a set_bit is
>> okay. >> Why are these hooks in place? I've tested alt-sysrq-t over
>> netconsole >> and also netdump with these #ifdef's removed, and things
>> work correctly. >> Compare this with alt-sysrq-t hanging the system
>> with these tests in >> place. If I run netdump with this logic still in
>> place, I get the >> following messages from the tg3
driver>
>> >>
eth0> BUG! Tx Ring full when queue awake!
>> >> Shall I send a patch, or have I missed something?
>>
mpm> I don't remember the origin or motivation of this bit, so I'm not sure
mpm> at the moment. Shoot me a patch and I'll poke at it.
>> What tree would you like me to base the patch on? In absence of
>> response, it will be the latest -mm.

mpm> -mm is fine.

mpm> Btw, did I send you my thoughts on your recursion patch?
>> Yes, from your mail:
>>
mpm> But I still don't like this. dev->poll() is liable to attempt to
mpm> recursively take its own driver lock again internally and we still
mpm> deadlock. Have we already seen recursion here? If we do, I think we
mpm> need to fix that in drivers. Meanwhile we should just bail here and
mpm> maybe set a "something bad happened" flag.
>> I'm not sure I follow. Which lock are you concerned about deadlocking
>> on?

mpm> A random lock private to a given driver, for instance one taken on
mpm> entry to the IRQ handler. If said driver tries to do a printk inside
mpm> that lock and we recursively call the handler in netconsole, we're in
mpm> trouble. These are the issues that will have to be cleaned up in
mpm> individual drivers. So far, I haven't seen any reports, but I'm pretty
mpm> sure such cases exist. I suppose it's also possible for us to disable
mpm> recursion in netconsole instead of at the netpoll level.

Recursion in netconsole is protected by the console semaphore.

-Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/