Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll

From: Jarek Poplawski
Date: Tue Jul 17 2007 - 01:37:55 EST


On 16-07-2007 11:12, Ingo Molnar wrote:
> current -git broke my main testbox. No TCP/IP networking to/from the box
> and e1000 would time out in xmit:
>
> NETDEV WATCHDOG: eth0: transmit timed out
> e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
...

Olaf, I think this error can trigger in this place:

> static void net_rx_action(struct softirq_action *h)
> {
> struct softnet_data *queue = &__get_cpu_var(softnet_data);
> unsigned long start_time = jiffies;
> int budget = netdev_budget;
> void *have;
>
> local_irq_disable();
>
> while (!list_empty(&queue->poll_list)) {
> struct net_device *dev;
>
> if (budget <= 0 || jiffies - start_time > 1)
> goto softnet_break;
>
> local_irq_enable();
>
> dev = list_entry(queue->poll_list.next,
> struct net_device, poll_list);
> have = netpoll_poll_lock(dev);
>
> if (dev->quota <= 0 || dev->poll(dev, &budget)) {

If after poll_napi dev->quota <= 0 dev->poll is not run and
__LINK_STATE_RX_SCHED bit (plus dev->poll_list) stays uncleared.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/