Re: TCP stalls after 128k writes

Savochkin Andrey Vladimirovich (saw@msu.ru)
Wed, 26 Aug 1998 11:17:06 +0400


On Sun, Aug 23, 1998 at 11:08:54PM +0200, Bernd Paysan wrote:
> Hi!
>
> I could figure out why TCP stalls under some circumstances after writing
> 128k - both the read and write buffer are full, and the next packet is
> discarded (see function tcp_data in linux/net/ipv4/tcp_input.c). This
> requires a retransmission. It seems that sk->data_ready doesn't wake up
> the reader in time.

I've done some investigations.
The reader is woken up as it should.
As Andi Kleen have already said the throughput loss is a buffer issue.
Packets with PSH flag produced by your program miss the fast path
so the sender do its job faster than the receiver.
Small packets has a bad ratio of the allocated buffer space to the data size.
The buffer space is exhausted earlier than the suggested window
is filled and the receiver starts to drop incoming packets.
I saw a few of prune_queue debug printings for the testsuit
with the low throughput. So you see exactly what you should.

>
> There is no code that forces wake up a reading process if there is "too
> much" data in the receive buffer (i.e. half full or whatever).
>
> I tried this theory by lowering the switch time (from 200ms to 20ms), and
> indeed the results during the critical segment of my test went up from
> 700k/s to 1M/s. But this is just curing the symptom. Someone should
> carefully check where rescheduling and waking input tasks would be a good
> idea. This synthetic benchmark shows this misbehaviour very good, since in
> real world other factors often hide it (i.e. once your receiving process
> is alone on the machine, there won't be an overflow).

It's not a misbehavior.
Your test shows that:
1. small packets cause the bad throughput;
2. the sufficient socket buffer space may increase the throughput
for a stream of small packets.

[snip]
> I added a small patch to tcp_input.c, which forces rescheduling when the
> input buffer is half full (see below). It's not a perfect cure (I think
> there are other sk->data_ready places which should be fixed), but it
> helps. The normal data_ready path results finally in reschedule_idle(),
> which forces rescheduling only for RR and FIFO scheduled tasks.

I think that any artificial rescheduling is a very bad thing.

>
> The patch does much better than the original kernel, thus 4096 times 512
> bytes is somewhere between 4 and 10 MB/s (wide variations between runs
> indicate that I seem to have missed something) instead of 0.7MB/s
> (original kernel).

Well, when I patched the kernel to produce more debug output
the throughput suddenly increased up to 18MB/s :-)
It's just a race between the sender, the receiver
and other processes (system loggers for my case).

Best regards
Andrey V.
Savochkin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html