Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll

From: Olaf Kirch
Date: Tue Jul 17 2007 - 10:20:59 EST


On Tuesday 17 July 2007 10:57, Ingo Molnar wrote:
> i've got a new observation: changing CONFIG_HZ from 250 to 1000 makes
> the problem go away. So it's somehow also related to jiffies.

There are several "Tx Hang detected" messages in the log, which looks
a lot as if net_rx_action never runs, or at least never calls
dev->poll on the e1000 nic.

Can you check whether/how often it bails out of net_rx_action taking
the softnet_break path?

My suspicion right now is that dev->quota goes way negative when
pushing out netconsole output. Normally, we bump the quota in
__net_rx_schedule. But the whole point of the patch is that
netpoll has no business removing the device from the poll_list,
so it stays there, and we don't end up calling __net_rx_schedule
as often as we would otherwise.

Can you try what happens if you change netif_rx_complete to
something like this:

if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, &dev->state)) {
dev->quota = dev->weight;
return;
}

This is just a hack to make sure that we don't go to insanely
negative quotas while sending packets through netpoll.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@xxxxxx | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/