Re: [RFC][PATCH] netconsole: avoid deadlock on printk from drivercode

From: Alexey Dobriyan
Date: Wed Aug 13 2008 - 06:00:04 EST


On Wed, Aug 13, 2008 at 11:53:24AM +0200, Vegard Nossum wrote:
> I encountered a hard-to-debug deadlock when I pulled out the plug of my
> RealTek 8139 which was also running netconsole: The driver wants to print
> a "link down" message. However, this triggers netconsole, which wants to
> print the message using the same device. Here is a backtrace:
>
> [<c05916b6>] _spin_lock_irqsave+0x76/0x90
> [<c035b255>] rtl8139_start_xmit+0x65/0x130 <-- spin_lock(&tp->lock)
> [<c04c5e28>] netpoll_send_skb+0x158/0x1a0
> [<c04c62fb>] netpoll_send_udp+0x1db/0x1f0
> [<c037c70c>] write_msg+0x8c/0xc0
> [<c0135883>] __call_console_drivers+0x53/0x60
> [<c01358db>] _call_console_drivers+0x4b/0x90
> [<c0135a25>] release_console_sem+0xc5/0x1f0
> [<c0135f0b>] vprintk+0x1ab/0x3e0
> [<c013615b>] printk+0x1b/0x20
> [<c0349736>] mii_check_media+0x196/0x1e0
> [<c03597f4>] rtl_check_media+0x24/0x30
> [<c035a0ea>] rtl8139_interrupt+0x42a/0x4a0 <-- spin_lock(&tp->lock)
> [<c01716d8>] handle_IRQ_event+0x28/0x70
> [<c0172d9b>] handle_fasteoi_irq+0x6b/0xe0
> [<c0107128>] do_IRQ+0x48/0xa0
>
> The least invasive fix is to detect that we're trying to re-enter the
> driver code. We provide a netdev_busy() function which can be used to
> determine whether a deadlock can occur if we try to transmit another
> packet.
>
> Note that this may lead to lost messages if the driver is active on
> another CPU while we try to use the same device for netconsole.

This sucks.

> It would probably be best to set a "lost messages" flag in this case and
> add it to the stream when the device becomes ready again.
>
> The only extra overhead in non-netconsole code paths is the fact that we
> need another callback in struct net_device. However, all drivers must be
> checked for the possibility of a deadlock and implement the ->busy()
> callback as necessary.

> --- a/drivers/net/8139too.c
> +++ b/drivers/net/8139too.c
> @@ -979,6 +980,7 @@ static int __devinit rtl8139_init_one (struct pci_dev *pdev,
> /* The Rtl8139-specific entries in the device structure. */
> dev->open = rtl8139_open;
> dev->hard_start_xmit = rtl8139_start_xmit;
> + dev->busy = rtl8139_busy;
> netif_napi_add(dev, &tp->napi, rtl8139_poll, 64);
> dev->stop = rtl8139_close;
> dev->get_stats = rtl8139_get_stats;
> @@ -1741,6 +1743,11 @@ static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev)
> return 0;
> }
>
> +static bool rtl8139_busy (struct net_device *dev)
> +{
> + struct rtl8139_private *tp = netdev_priv(dev);
> + return spin_is_locked(&tp->lock);
> +}

How do I know if my driver is suspectible to this sort of deadlock?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/