Re: Noticeable slow-down in 2.6.35-rc3

From: François Valenduc
Date: Sun Jun 13 2010 - 16:41:37 EST


Le 13/06/10 22:15, Chris Clayton a écrit :
> Hi,
>
> Please cc me on any reply because I'm not subscribed to linux-kernel
> or linux-net
>
> I've noticed a slowdown in 2.6.35-rc3. It shows up in a few places:
>
> 1. When my desktop (KDE 3.5.10) is starting up, the "Initialising
> system services" phase takes about 45 seconds as opposed to the normal
> 4 or 5 seconds., Similarly, whilst the basic KDE panel draws as
> normal, the icons and other gadgets that it normally contains take
> about 15 seconds to appear.
>
> 2. In firefox (3.6.3), there is a short (a second or two), but
> noticeable, delay when a menu or sub-menu label is clicked on before
> the {sub-,}menu appears. Normally the response id almost instant.
>
> There are some similarities with Gene Heskett's report at
> http://marc.info/?l=linux-kernel&m=127635846208957
>
> I've bisected it and arrived at:
>
> 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 is the first bad commit
> commit 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
> Author: John Fastabend <john.r.fastabend@xxxxxxxxx>
> Date: Thu Jun 3 09:30:11 2010 +0000
>
> net: deliver skbs on inactive slaves to exact matches
>
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop(). This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
>
> For example,
>
> vlanx -> bond0 -> ethx
>
> will be dropped in the vlan path and not delivered to any
> packet handlers at all. However,
>
> bond0 -> vlanx -> ethx
>
> and
>
> bond0 -> ethx
>
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
>
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv(). Here we add
> logic to check for the deliver_no_wcard flag and if it
> is set only deliver to handlers that match exactly. This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
>
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
>
> # bond0 -> ethx
>
> # vlanx -> bond0 -> ethx
>
> # bond0 -> vlanx -> ethx
>
> # bond0 -> ethx
> |
> vlanx -> --
>
> Signed-off-by: John Fastabend <john.r.fastabend@xxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>
> :040000 040000 f272ab5b895c46b3166d321a2da759c2a6e08ae0
> 467d28aad962f3506bc8820241d7417fb93e507f M include
> :040000 040000 b4c5eb03a781b5ca016459ae19ebe2175d119eda
> 9c0ce9f12b43aecd9fee9ed816e11841b7b81fd8 M net
>
> The bisect log:
>
> # bad: [7e27d6e778cd87b6f2415515d7127eba53fe5d02] Linux 2.6.35-rc3
> # good: [e44a21b7268a022c7749f521c06214145bd161e4] Linux 2.6.35-rc2
> git bisect start 'v2.6.35-rc3' 'v2.6.35-rc2'
> # good: [6db40cf047a8723095caf79f5569d21b388d7b31] pipe: fix check in
> "set size" fcntl
> git bisect good 6db40cf047a8723095caf79f5569d21b388d7b31
> # good: [63c70a0d7b59bac08bd14cd24c36f76aafc25de6] Merge branch
> 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
> git bisect good 63c70a0d7b59bac08bd14cd24c36f76aafc25de6
> # good: [6f902af400b2499c80865c62a06fbbd15cf804fd] Btrfs: The file
> argument for fsync() is never null
> git bisect good 6f902af400b2499c80865c62a06fbbd15cf804fd
> # good: [7ae1277a5202109a31d8f81ac99d4a53278dab84] Merge branch
> 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
> git bisect good 7ae1277a5202109a31d8f81ac99d4a53278dab84
> # good: [00d9d6a185de89edc0649ca4ead58f0283dfcbac] ipv6: fix ICMP6_MIB_OUTERRORS
> git bisect good 00d9d6a185de89edc0649ca4ead58f0283dfcbac
> # bad: [349124a00754129a5f1e43efa84733e364bf3749] net8139: fix a race
> at the end of NAPI
> git bisect bad 349124a00754129a5f1e43efa84733e364bf3749
> # bad: [ae638c47dc040b8def16d05dc6acdd527628f231] pkt_sched:
> gen_estimator: add a new lock
> git bisect bad ae638c47dc040b8def16d05dc6acdd527628f231
> # bad: [597a264b1a9c7e36d1728f677c66c5c1f7e3b837] net: deliver skbs on
> inactive slaves to exact matches
> git bisect bad 597a264b1a9c7e36d1728f677c66c5c1f7e3b837
>
> Reversing the identified patch gives a kernel without the slowdowns.
>
> bzip'd .config is attached.
>
> Happy to test fixes or provide additional diagnostics, but for the
> latter I'll need clear instructions - I'm not that familiar with the
> net tools.
>
> Chris

This commit also makes nfsd hangs at startup on my computer (see
https://bugzilla.kernel.org/show_bug.cgi?id=16195). This problem doesn't
occur if it's reverted.

François Valenduc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/