Re: TCP connection issues against Amazon S3

From: Yuchung Cheng
Date: Wed Jan 07 2015 - 16:34:41 EST


On Wed, Jan 7, 2015 at 12:37 PM, Erik Grinaker <erik@xxxxxxxxxx> wrote:
> On 07 Jan 2015, at 15:58, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>> On Wed, 2015-01-07 at 13:31 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 22:00, Yuchung Cheng <ycheng@xxxxxxxxxx> wrote:
>>>> On Tue, Jan 6, 2015 at 1:04 PM, Erik Grinaker <erik@xxxxxxxxxx> wrote:
>>>>>
>>>>>> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@xxxxxxxxxx> wrote:
>>>>> This still doesnât explain why it works with older kernels, but not newer ones. Iâm thinking itâs
>>>> probably some minor change, which gets amplified by the lack of SACKs
>>>> on the loadbalancer. Anyway, Iâll bring it up with Amazon.
>>>> can you post traces with the older kernels?
>>>
>>> Here is a dump using 3.11.10 against a non-SACK-enabled loadbalancer:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-nosack-3.11.10.pcap.bz2
>>>
>>> The transfer shows lots of DUPACKs and retransmits, but this does not
>>> seem to have as bad an effect as it did with the failing transfer we
>>> saw on newer kernels:
>>>
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>>
>>> One big difference, which Rick touched on earlier, is that the newer
>>> kernels keep sending TCP window updates as itâs going through the
>>> retransmits. The older kernel does not do this.
>>
>> The new kernel is the receiver : It does no retransmits.
>>
>> Increasing window in ACK packets should not prevent sender into
>> retransmitting missing packets.
>>
>> Sender is not a linux host and is very buggy IMO : If receiver
>> advertises a too big window, sender decides to not retransmit in some
>> cases.
>
> I agree. I have contacted Amazon about this, but am not too hopeful for a quick fix; they have been promising SACK-support on their loadbalancers since 2006, for example.
>
> That said, since this change breaks a service as popular as S3, it might be worth reconsidering.
With the newer kernel and bigger receive window, the sender skips (the
already slow NewReno) fast recovery and falls back to (exp backoff)
timeout recovery. Reducing rwin to accommodate the sender's bug seems
backward to me.


>
>> You can play with /proc/sys/net/ipv4/tcp_rmem and adopt very low values
>> to work around the sender bug.
>>
>> ( Or use SO_RCVBUF in receiver application)
>
> Thanks, setting SO_RCVBUF seems like a reasonable workaround.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/