Re: [Patch] bonding: fix netpoll in active-backup mode

From: Neil Horman
Date: Tue Mar 08 2011 - 08:26:51 EST


On Tue, Mar 08, 2011 at 12:15:12PM +0800, Cong Wang wrote:
> ä 2011å03æ08æ 02:50, Neil Horman åé:
> >On Mon, Mar 07, 2011 at 10:11:50PM +0800, Amerigo Wang wrote:
> >>netconsole doesn't work in active-backup mode, because we don't do anything
> >>for nic failover in active-backup mode. This patch fixes the problem by:
> >>
> >>1) make slave_enable_netpoll() and slave_disable_netpoll() callable in softirq
> >> context, that is, moving code after synchronize_rcu_bh() into call_rcu_bh()
> >> callback function, teaching kzalloc() to use GFP_ATOMIC.
> >>
> >>2) disable netpoll on old slave and enable netpoll on the new slave.
> >>
> >>Tested by ifdown the current active slave and ifup it again for several times,
> >>netconsole works well.
> >>
> >>Signed-off-by: WANG Cong<amwang@xxxxxxxxxx>
> >>
> >I may be missing soething but this seems way over-complicated to me. I presume
> >the problem is that in active backup mode a failover results in the new active
> >slave not having netpoll setup on it? If thats the case, why not just setup
> >netpoll on all slaves when ndo_netpoll_setup is called on the bonding interface?
> >I don't see anything immeidately catastrophic that would happen as a result.
>
>
> But we still need to clean up the netpoll on the failing slave, which still
> needs to call slave_disable_netpoll() in monitor code, I see no big differences
> with the solution I take.
>
Why? I understand you want to free up that memory, but I don't see any special
state codified in that structure that can't wait until you disable netpoll on
the bond as a whole. Save yourself the time and trouble, enable netpoll on both
slaves when its enabled on the bond, and tear it down when its torn down on the
bond. Do worry about doing anything during a failover.

Neil
>
> >And then you wouldn't have to worry about disabling/enabling anything on a
> >failover (or during a panic for that matter). As for the rcu bits? Why are
> >they needed? One would presume that wouldn't (or at least shouldn't) be able to
> >teardown our netpoll setup until such time as all the pending frames for that
> >netpoll client have been transmitted. If we're not blocknig on that RCU isn't
> >really going to help. Seems like the proper fix is take a reference to the
> >appropriate npinfo struct in netpoll_send_skb, and drop it from the skbs
> >destructor or some such.
>
> I saw a "scheduling while in atomic" warning without touching the rcu bits.
>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/