Re: [PATCH] tcp: Modify the condition for the first skb to collapse

From: Jun Chen
Date: Mon Jun 17 2013 - 21:53:05 EST


On Mon, 2013-06-17 at 06:21 -0700, Eric Dumazet wrote:
> On Mon, 2013-06-17 at 14:52 -0400, Jun Chen wrote:
> > On Mon, 2013-06-17 at 03:29 -0700, Eric Dumazet wrote:
> > > On Mon, 2013-06-17 at 13:29 -0400, Jun Chen wrote:
> > > > >
> > > > hi,
> > > > When the condition of tcp_win_from_space(skb->truesize) > skb->len is
> > > > true but the before(start, TCP_SKB_CB(skb)->seq) is also true, the final
> > > > condition will be true. The follow line:
> > > > int offset = start - TCP_SKB_CB(skb)->seq;
> > > > BUG_ON(offset < 0);
> > > > this BUG_ON will be triggered.
> > > >
> > >
> > > Really this should never happen, we must track what's happening here.
> > It's very very rare, but the logic of codes have such a little hole.
> > >
> > > Are you using a pristine kernel, without any patches ?
> > The based kernel version is 3.4.
> > >
> > > Are you able to reproduce this bug in a short amount of time ?
> > I can't reproduce it in short time, this log had just been found once
> > for long long time tests on many devices .
> > >
> > > What kind of driver is in use ? (your stack trace was truncated)
> >
> > I attach the whole stack traces for you.
> >

> Any other suspect messages before this, a memory allocation error for
> example ?
>
> I believe we have a bug in tcp_collapse() if one alloc_skb() returns
> NULL while we were in the middle of collapsing a big GRO packet.
>
> gro_skb needed 3 skb to be rebuilt, and only two skbs could be allocated
>
> skb1: seq=X end_seq=X+4000
> skb2: seq=X+4000 end_seq=X+8000
> <missing>
> grp_skb: seq=X end_seq=X+16000
>
> Next time we call tcp_collapse(), we might split again the GRO packet
> and get following incorrect queue :
>
> skb1: seq=X end_seq=X+4000
> skb2: seq=X+4000 end_seq=X+8000
> skb3: seq=X end_seq=X+4000
> skb4: seq=X+4000 end_seq=X+8000
> skb5: seq=X+8000 end_seq=X+12000
> skb6: seq=X+12000 end_seq=X+16000
>
>
> I would use the following patch instead, to narrow the problem
>
> If we really find in the ofo queue a skb with a lower seq than the
> previous one, we should complain instead of lowering @start, since
> this is going to crash later.
>
> receive_queue / ofo_queue should contain monotonically increasing
> skb->seq.
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 46271cdc..5507a09 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4513,8 +4513,10 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
> start = TCP_SKB_CB(skb)->seq;
> end = TCP_SKB_CB(skb)->end_seq;
> } else {
> - if (before(TCP_SKB_CB(skb)->seq, start))
> - start = TCP_SKB_CB(skb)->seq;
> + if (before(TCP_SKB_CB(skb)->seq, start)) {
> + pr_err_once("tcp_collapse_ofo_queue() : seq %08x before start %08X\n",
> + TCP_SKB_CB(skb)->seq, start);
> + }
> if (after(TCP_SKB_CB(skb)->end_seq, end))
> end = TCP_SKB_CB(skb)->end_seq;
> }
>
>
There are many warning for tcp_recvmsg before this crash. I can't find
other memory warning in the logs, but I'm not sure whether there are
memory issues because of the length limitation of saved logs. I think
this logs will give you more information.

<4>[ 7736.343742] ------------[ cut here ]------------

<4>[ 7736.343759] WARNING:
at /data/buildbot/workdir/jb/kernel/net/ipv4/tcp.c:1496 tcp_recvmsg
+0x3bf/0x910()

<4>[ 7736.343775] recvmsg bug: copied AB57C870 seq AB57CD95 rcvnxt
AB57F19F fl 0

<4>[ 7736.343845] Call Trace:

<4>[ 7736.343865] [<c1237032>] warn_slowpath_common+0x72/0xa0

<4>[ 7736.343888] [<c18a955f>] ? tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343902] [<c18a955f>] ? tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343922] [<c1237103>] warn_slowpath_fmt+0x33/0x40

<4>[ 7736.343944] [<c18a955f>] tcp_recvmsg+0x3bf/0x910

<4>[ 7736.343968] [<c18c9bb5>] inet_recvmsg+0x85/0xa0

<4>[ 7736.343992] [<c1852030>] sock_aio_read+0x140/0x160

<4>[ 7736.344016] [<c126b221>] ? set_next_entity+0xc1/0xf0

<4>[ 7736.344039] [<c130d627>] do_sync_read+0xb7/0xf0

<4>[ 7736.344064] [<c130dc6c>] ? rw_verify_area+0x6c/0x120

<4>[ 7736.344077] [<c1349aa8>] ? sys_epoll_wait+0x68/0x360

<4>[ 7736.344098] [<c130e1e9>] vfs_read+0x149/0x160

<4>[ 7736.344120] [<c130f518>] ? fget_light+0x58/0xd0

<4>[ 7736.344142] [<c130e23d>] sys_read+0x3d/0x70

<4>[ 7736.344164] [<c198c361>] syscall_call+0x7/0xb

<4>[ 7736.344187] [<c1980000>] ? perf_cpu_notify+0x45/0x89

<4>[ 7736.344205] ---[ end trace b3c5b245ce7ff5b5 ]---



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/