Re: [PATCH 00/31] net/tcp: Add TCP-AO support

From: Leonard Crestez
Date: Tue Aug 23 2022 - 13:41:44 EST


On 8/22/22 23:35, Dmitry Safonov wrote:
Hi Leonard, David,

On 8/22/22 00:51, David Ahern wrote:
On 8/21/22 2:34 PM, Leonard Crestez wrote:
On 8/18/22 19:59, Dmitry Safonov wrote:
This patchset implements the TCP-AO option as described in RFC5925. There
is a request from industry to move away from TCP-MD5SIG and it seems
the time
is right to have a TCP-AO upstreamed. This TCP option is meant to replace
the TCP MD5 option and address its shortcomings. Specifically, it
provides
more secure hashing, key rotation and support for long-lived connections
(see the summary of TCP-AO advantages over TCP-MD5 in (1.3) of RFC5925).
The patch series starts with six patches that are not specific to TCP-AO
but implement a general crypto facility that we thought is useful
to eliminate code duplication between TCP-MD5SIG and TCP-AO as well as
other
crypto users. These six patches are being submitted separately in
a different patchset [1]. Including them here will show better the gain
in code sharing. Next are 18 patches that implement the actual TCP-AO
option,
followed by patches implementing selftests.

The patch set was written as a collaboration of three authors (in
alphabetical
order): Dmitry Safonov, Francesco Ruggeri and Salam Noureddine.
Additional
credits should be given to Prasad Koya, who was involved in early
prototyping
a few years back. There is also a separate submission done by Leonard
Crestez
whom we thank for his efforts getting an implementation of RFC5925
submitted
for review upstream [2]. This is an independent implementation that makes
different design decisions.

Is this based on something that Arista has had running for a while now
or is a recent new development?


...

Seeing an entirely distinct unrelated implementation is very unexpected.
What made you do this?


I am curious as well. You are well aware of Leonard's efforts which go
back a long time, why go off and do a separate implementation?

When I started working on this, there was a prototype that was neither
good for upstream, nor for customers. At the moment Leonard submitted
his RFC, I was already giving feedback/reviews to local code and
patches. So, as I was aware of the details of TCP-AO, I started giving
Leonard feedback and reviews, based on what I've learned from RFC/code.
I thought whatever code will make it upstream, it can benefit from my
reviews. Some of my comments were based on a better code I saw locally,
or a way of improving it that I've suggested to both sides.

I'm quite happy that Leonard addressed some of my comments (i.e.
extendable syscalls), I see that it improved his patches.
On the other hand, some of the comments I've left moved to "known
limitations" with no foreseeable plan to fix them, while they were
addressed in local/Arista code.

And now a little bit more than a year later, it seems that the quality
of local patches has reached a point where they can be submitted for
an upstream review. So, please don't misunderstand me, it's not that
"drop your implementation, take ours" and it's not that we've
intentionally hidden that we're working on TCP-AO. It's that it is the
first moment we can make upstream aware of an alternative implementation.

Personally, I think it's best for opensource community:
- Arista's implementation is now public
- there are now at least 4 people that understand RFC5925 and the
code/details
- in a discussion, it will be possible to find what will be the best
from both implementations for Linux and come up with better code

At this particular moment, it seems neither of patch sets is ready to be
merged "as-is". But it seems that there's enough interest from both
sides and likely it guarantees that there will be enough effort to make
something merge-able, that will work for all interested parties.

As for my part, I'm interested in the best code upstream, regardless who
is the author. This includes:
- reusing the existing TCP-MD5 code, rather than copying'n'pasting for
TCP-AO with intent to refactor it some day later

I had a requirement to deploy on linux 5.4 so I very deliberately avoided touching MD5. I'm not sure there very much duplication anyway.

- making setsockopt()s and other syscalls extendable
- cover functionality with selftests

My implementation is tested with a standalone python package, this is a design choice which doesn't particularly matter.

- following RFC5925 in implementation, especially "required" and "must"
parts

I'm not convinced that "don't delete current key" needs to be literally interpreted as a hard requirement for the linux ABI. Most TCP RFCs don't specify any sort of API at all and it would be entirely valid to implement BGP-TCP-AO as a single executable with no internally documented boundaries.

I hope that clarifies how and why now there are two patch sets that
implement the same RFC/functionality.

As far as I can tell the biggest problem is that is quite difficult to implement the userspace side of TCP-AO complete with key rollover. Our two implementation both claim to support this but through different ABI and both require active management from userspace.

I think it would make sense to push key validity times and the key selection policy entirely in the kernel so that it can handle key rotation/expiration by itself. This way userspace only has to configure the keys and doesn't have to touch established connections at all.

My series has a "flags" field on the key struct where it can filter by IP, prefix, ifindex and so on. It would be possible to add additional flags for making the key only valid between certain times (by wall time).

The kernel could then make key selections itself:
- send rnextkeyid based on the key with the longest recv lifetime
- send keyid based on remote rnextkeyid.
- If not applicable (rnextkeyid not found locally, or for SYN) pick based on longest send lifetime.
- If all keys expire then return an error on write()
- Solve other ambiguities in a predictable way: if valid times are equal then pick the lowest numeric send_id or recv_id.

Explicit key selection from userspace could still be supported but it would be optional and most apps wouldn't bother implementing their own policy. The biggest advantage is that it would be much easier for applications to adopt TCP-AO.

--
Regards,
Leonard