Re: [2.6.33-rc5 regression] NULL pointer dereference invlan_skb_recv - probably introduced by commit9793241fe92f7d9303fb221e43fc598eb065f267

From: Bruno PrÃmont
Date: Sun Jan 24 2010 - 10:11:56 EST


On Sun, 24 January 2010 Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Le 23/01/2010 22:31, Bruno PrÃmont a Ãcrit :
> >> Above part of code did change between 2.6.32 and 2.6.33-rc5 with
> >> commit 9793241fe92f7d9303fb221e43fc598eb065f267 (vlan: Precise RX
> >> stats accounting)
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9793241fe92f7d9303fb221e43fc598eb065f267
> >
> > Reverting just that commit gets the system running correctly.
> >
> > Bruno
>
> I have no idea how this patch can break vlan networking.
>
> Your disassembly and .config seems to show your machine is not SMP
>
> Maybe something is broken on UP and alloc_percpu() ?
>
> Could you add a debug in vlan_dev_init()
>
>
> vlan_dev_info(dev)->vlan_rx_stats = alloc_percpu(struct
> vlan_rx_stats); if (!vlan_dev_info(dev)->vlan_rx_stats)
> return -ENOMEM;
> + pr_err("vlan_dev_init() rx_stats=%p\n",
> vlan_dev_info(dev)->vlan_rx_stats);
>
>
> This make sure vlan_dev_init() is called and percpu allocation is
> properly done.
>
> Thanks

Readding your "precise RX stats" commit with following additional changes I get
(first hunk to avoid crash):

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index b788978..9216a98 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -165,8 +165,11 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,

rx_stats = per_cpu_ptr(vlan_dev_info(dev)->vlan_rx_stats,
smp_processor_id());
- rx_stats->rx_packets++;
- rx_stats->rx_bytes += skb->len;
+ if (rx_stats) {
+ rx_stats->rx_packets++;
+ rx_stats->rx_bytes += skb->len;
+ } else
+ pr_err("vlan_skb_recv() rx_stats=%p -> %p\n",
vlan_dev_info(dev)->vlan_rx_stats, rx_stats);
skb_pull_rcsum(skb, VLAN_HLEN);

@@ -738,6 +741,7 @@ static int vlan_dev_init(struct net_device *dev)
vlan_dev_info(dev)->vlan_rx_stats = alloc_percpu(struct
vlan_rx_stats); if (!vlan_dev_info(dev)->vlan_rx_stats)
return -ENOMEM;
+ pr_err("vlan_dev_init() rx_stats=%p\n",
vlan_dev_info(dev)->vlan_rx_stats);
return 0;
}


...
[ 13.877610] Ending clean XFS mount for filesystem: loop0
[ 15.795601] Adding 1004020k swap on /dev/sda5. Priority:-1 extents:1 across:1004020k
[ 16.612143] ADDRCONF(NETDEV_UP): lan: link is not ready
[ 16.851846] 802.1Q VLAN Support v1.8 Ben Greear <greearb@xxxxxxxxxxxxxxx>
[ 16.851856] All bugs added by David S. Miller <davem@xxxxxxxxxx>
[ 16.878049] vlan_dev_init() rx_stats=dceae290
[ 20.040395] b44: lan: Link is up at 100 Mbps, full duplex.
[ 20.040404] b44: lan: Flow control is off for TX and off for RX.
[ 20.040535] ADDRCONF(NETDEV_CHANGE): lan: link becomes ready
[ 25.019941] RPC: Registered udp transport module.
[ 25.019950] RPC: Registered tcp transport module.
[ 25.019956] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 25.382400] svc: failed to register lockdv1 RPC service (errno 97).
[ 25.385278] mount.nfs used greatest stack depth: 1012 bytes left
[ 25.872973] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 26.740561] vlan_skb_recv() rx_stats=(null) -> (null)
[ 26.775071] vlan_skb_recv() rx_stats=(null) -> (null)
[ 27.050223] lan.658: no IPv6 routers present
[ 58.357052] vlan_skb_recv() rx_stats=(null) -> (null)
[ 63.350334] vlan_skb_recv() rx_stats=(null) -> (null)
[ 91.767589] vlan_skb_recv() rx_stats=(null) -> (null)
[ 91.932344] vlan_skb_recv() rx_stats=(null) -> (null)
[ 91.933053] vlan_skb_recv() rx_stats=(null) -> (null)
[ 91.998628] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.002752] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.007918] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.009644] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.016789] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.018520] vlan_skb_recv() rx_stats=(null) -> (null)
[ 92.041737] vlan_skb_recv() rx_stats=(null) -> (null)
...

So during vlan_dev_init() rx_stats is properly initialized, but when
packets reach the vlan dev later on exactly that same pointer is NULL...

So question is, who did reset vlan_dev_info(dev)->vlan_rx_stats to NULL?

Thanks,
Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/