Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns

From: Liran Alon
Date: Thu Mar 15 2018 - 11:06:06 EST



----- daniel@xxxxxxxxxxxxx wrote:

> On 03/15/2018 03:35 PM, Roman Mashak wrote:
> > Liran Alon <liran.alon@xxxxxxxxxx> writes:
> > [...]
> >>> Overall I think it might be nice to not need scrubbing skb in
> such
> >>> cases,
> >>> although my concern would be that this has potential to break
> >>> existing
> >>> setups when they would expect mark being zero on other veth peer
> in
> >>> any
> >>> case since it's the behavior for a long time already. The safer
> >>> option
> >>> would be to have some sort of explicit opt-in e.g. on link
> creation to
> >>> let
> >>> the skb->mark pass through unscrubbed. This would definitely be a
> >>> useful
> >>> option e.g. when mark is set in the netns facing veth via
> >>> clsact/egress
> >>> on xmit and when the container is unprivileged anyway.
> >>>
> >>> Thanks,
> >>> Daniel
> >>
> >> I see your point in regards to backwards comparability.
> >> However, not scrubbing skb when it cross netns via some kernel
> functions compared to
> >> others is basically a bug which could easily break with a little
> bit of more refactoring.
> >> Therefore, it seems a bit weird to me to from now on, we will
> force
> >> every user on link creation to consider that once there was a bug
> leading
> >> to this weird behavior on specific netdevs.
>
> Why bug specifically? It could well be that for some unpriv
> containers
> it would be fine to do e.g. in cases where orchestrator sets up
> clsact/
> egress on veth/ipvlan/etc in the container to set the mark and where
> app
> cannot mess with this while for others you need to act out of host
> facing
> veth; thus, explicit opt-in per dev could provide such more fine
> grained
> control.
>
> > One valid use case could be preserving a source namespace nsid in
> > skb->mark when a packet crosses netns.
>
> Right, was thinking about something similar.

I agree with all the above but this behavior was not supported both
before and after this commit. skb->mark is always zeroed when crossing netns.
This commit only changes behavior for skb crossing netdevs on *same* netns
via dev_forward_skb().

Therefore, I believe we should discuss here what we want default behavior to be
and how it should be controlled for backwards comparability.
Only after we should discuss about adding an extra feature of controlling skb scrub
per netdev or something similar.