Re: [ovs-dev] [RFC net-next 1/6] openvswitch: exclude kernel flow key from upcalls

From: Aaron Conole
Date: Tue Nov 29 2022 - 09:27:45 EST


Adrian Moreno <amorenoz@xxxxxxxxxx> writes:

> On 11/25/22 16:51, Ilya Maximets wrote:
>> On 11/25/22 16:29, Adrian Moreno wrote:
>>>
>>>
>>> On 11/23/22 22:22, Ilya Maximets wrote:
>>>> On 11/22/22 15:03, Aaron Conole wrote:
>>>>> When processing upcall commands, two groups of data are available to
>>>>> userspace for processing: the actual packet data and the kernel
>>>>> sw flow key data.  The inclusion of the flow key allows the userspace
>>>>> avoid running through the dissection again.
>>>>>
>>>>> However, the userspace can choose to ignore the flow key data, as is
>>>>> the case in some ovs-vswitchd upcall processing.  For these messages,
>>>>> having the flow key data merely adds additional data to the upcall
>>>>> pipeline without any actual gain.  Userspace simply throws the data
>>>>> away anyway.
>>>>
>>>> Hi, Aaron.  While it's true that OVS in userpsace is re-parsing the
>>>> packet from scratch and using the newly parsed key for the OpenFlow
>>>> translation, the kernel-porvided key is still used in a few important
>>>> places.  Mainly for the compatibility checking.  The use is described
>>>> here in more details:
>>>>    https://docs.kernel.org/networking/openvswitch.html#flow-key-compatibility
>>>>
>>>> We need to compare the key generated in userspace with the key
>>>> generated by the kernel to know if it's safe to install the new flow
>>>> to the kernel, i.e. if the kernel and OVS userpsace are parsing the
>>>> packet in the same way.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Do we need to do that for every packet?
>>> Could we send a bitmask of supported fields to userspace at feature
>>> negotiation and let OVS slowpath flows that it knows the kernel won't
>>> be able to handle properly?
>> It's not that simple, because supported fields in a packet depend
>> on previous fields in that same packet. For example, parsing TCP
>> header is generally supported, but it won't be parsed for IPv6
>> fragments (even the first one), number of vlan headers will affect
>> the parsing as we do not parse deeper than 2 vlan headers, etc.
>> So, I'm afraid we have to have a per-packet information, unless we
>> can somehow probe all the possible valid combinations of packet
>> headers.
>>
>
> Surely. I understand that we'd need more than just a bit per
> field. Things like L4 on IPv6 frags would need another bit and the
> number of VLAN headers would need some more. But, are these a handful
> of exceptions or do we really need all the possible combinations of
> headers? If it's a matter of naming a handful of corner cases I think
> we could consider expressing them at initialization time and safe some
> buffer space plus computation time both in kernel and userspace.

I will take a bit more of a look here - there must surely be a way to
express this when pulling information via DP_GET command so that we
don't need to wait for a packet to come in to figure out whether we can
parse it.