Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled

From: Yuchung Cheng
Date: Wed Jun 27 2018 - 19:57:46 EST


On Fri, Jun 15, 2018 at 3:35 AM, Ilpo JÃrvinen
<ilpo.jarvinen@xxxxxxxxxxx> wrote:
> On Fri, 15 Jun 2018, Michal Kubecek wrote:
>
>> On Fri, Jun 15, 2018 at 11:05:03AM +0300, Ilpo JÃrvinen wrote:
>> > On Thu, 14 Jun 2018, Michal Kubecek wrote:
>> >
>> > My point was that the new data segment bursts that occur if the sender
>> > isn't application limited indicate that there's something going wrong
>> > with FRTO. And that wrong is also what is causing that RTO loop because
>> > the sender doesn't see the previous FRTO recovery on second RTO. With
>> > my FRTO undo fix, (new_recovery || icsk->icsk_retransmits) will be false
>> > and that will prevent the RTO loop.
>>
>> Yes, it would prevent the loop in this case (except it would be a bit
>> later, after second RTO rather than after first).
>
> Hmm, I'm actually wrong about the new data missing bit I think. After
> reading more code I'm quite sure conventional RTO recovery is triggered
> right away (as long as that bogus undo that ends the recovery wouldn't
> occur first like it does without my fix). So no second RTO would be
> needed.
>
>> But I'm not convinced
>> the logic of the patch is correct. If I understand it correctly, it
>> essentially changes "presumption of innocence" (if we get an ack past
>> what we retransmitted, we assume it was received earlier - i.e. would
>> have been sacked before if SACK was in use) to "presumption of guilt"
>> (whenever a retransmitted segment is acked, we assume nothing else acked
>> with it was received earlier). Or that you trade false negatives for
>> false positives.
>
> FRTO depends on knowing for sure what packet (original pre-RTO one or
> something that was transmitted post-RTO) triggered the ACK. If FRTO
> isn't sure that the ACK was generated by a post-RTO packet, it must
> not assume innocence! This change in practice affects just the time while
> the segment rexmitted by RTO is still there, that is, processing in step 2
> (if we get a cumulative ACK beyond it because the next loss is not for the
> subsequent segment but for some later segment, FLAG_ORIG_SACK_ACKED is set
> and we'll incorrectly do step 3b while still in FRTO has only reached step
> 2 for real; this is fixed by my patch). ...The decision about
> positive/negative only occurs _after_ that in step 3.
>
>> Maybe I understand it wrong but it seems that you de facto prevent
>> Step (3b) from ever happening in non-SACK case.
>
> Only if any of skb that was ACKed had been retransmitted. There shouldn't
> be any in step 3 because the RTO rexmit was ACKed (and also because
> of how new_recovery variable works wrt. earlier recoveries). Thus, in
> step 3 the fix won't clear FLAG_ORIG_SACK_ACKED flag allowing FRTO to
> detect spurious RTOs using step 3b.
>
>> > > > No! The window should not update window on ACKs the receiver intends to
>> > > > designate as "duplicate ACKs". That is not without some potential cost
>> > > > though as it requires delaying window updates up to the next cumulative
>> > > > ACK. In the non-SACK series one of the changes is fixing this for
>> > > > non-SACK Linux TCP flows.
>> > >
>> > > That sounds like a reasonable change (at least at the first glance,
>> > > I didn't think about it too deeply) but even if we fix Linux stack to
>> > > behave like this, we cannot force everyone else to do the same.
>> >
>> > Unfortunately I don't know what the other stacks besides Linux do. But
>> > for Linux, the cause for the changing receiver window is the receiver
>> > window auto-tuning and I'm not sure if other stacks have a similar
>> > feature (or if that affects (almost) all ACKs like in Linux).
>>
>> The capture from my previous e-mail and some others I have seen indicate
>> that at least some implementations do not take care to never change
>> window size when responding to an out-of-order segment. That means that
>> even if we change linux TCP this way (we might still need to send
>> a separate window update in some cases), we still cannot rely on others
>> doing the same.
>
> Those implementations ignore what is a duplicate ACK (RFC5681, which
> is also pointed into by RFC5682 for its defination):
> DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
> "duplicate" ... (e)
> the advertised window in the incoming acknowledgment equals the
> advertised window in the last incoming acknowledgment.
>
> Not sending duplicate ACKs also means that fast recovery will not work
> for those flows but that may not show up performance wise as long as you
> have enough capacity for any unnecessary rexmits the forced RTO recovery
> is going to do. RTO recovery may actually improve completion times for
> non-SACK flows as NewReno recovers only one lost pkt/RTT where as RTO
> recovery needs log(outstanding packets) RTTs at worst. For a large number
> of losses in a window, the log is going to win.
>
>> I checked the capture attached to my previous e-mail again and there is
>> one thing where our F-RTO implementation (in 4.4, at least) is wrong,
>> IMHO. While the first ACK after "new data" (sent in (2b)) was a window
>> update (and therefore not dupack by definition) so that we could take
>> neither (3a) nor (3b), in some iterations there were further acks which
>> did not change window size. The text just before Step 1 says
>>
>> The F-RTO algorithm does not specify actions for receiving
>> a segment that neither acknowledges new data nor is a duplicate
>> acknowledgment. The TCP sender SHOULD ignore such segments and
>> wait for a segment that either acknowledges new data or is
>> a duplicate acknowledgment.
>>
>> My understanding is that this means that while the first ack after new
>> data is correctly ignored, the following ack which preserves window size
>> should be recognized as a dupack and (3a) should be taken.
>
> Linux FRTO never gets that far (without my fix) if the ACK in step 2
> covers beyond the RTO rexmit because 3b is prematurely invoked, that's
> why you never see what would occur if 3a is taken. TCP thinks it's not
> recovering anymore and therefore can send only new data (if there's some
> available).
>
> This is what I tried to tell earlier, with new data there you see there's
> something else wrong too with FRTO besides the RTO loop.
agreed. Ilpo do you mind re-submitting your fix
https://patchwork.ozlabs.org/patch/883654/ (IIRC I already acked-by)

tcptest suite may have to wait due to some internal workload Neal is juggling.

>
>
> --
> i.