Re: [PATCH v5 0/9] net: fix sysfs permssions when device changes network

From: Christian Brauner
Date: Tue Feb 25 2020 - 08:10:10 EST


On Tue, Feb 25, 2020 at 01:49:39PM +0100, Christian Brauner wrote:
> Hey everyone,
>
> /* v5 */
> This is v5 with a small fixup requested by Rafael.
>
> /* v4 */
> This is v4 with more documentation and other fixes that Greg requested.
>
> /* v3 */
> This is v3 with explicit uid and gid parameters added to functions that
> change sysfs object ownership as Greg requested.
>
> (I've tagged this with net-next since it's triggered by a bug for
> network device files but it also touches driver core aspects so it's
> not clear-cut. I can of course split this series into separate
> patchsets.)
> We have been struggling with a bug surrounding the ownership of network
> device sysfs files when moving network devices between network
> namespaces owned by different user namespaces reported by multiple
> users.
>
> Currently, when moving network devices between network namespaces the
> ownership of the corresponding sysfs entries is not changed. This leads
> to problems when tools try to operate on the corresponding sysfs files.
>
> I also causes a bug when creating a network device in a network
> namespaces owned by a user namespace and moving that network device back
> to the host network namespaces. Because when a network device is created
> in a network namespaces it will be owned by the root user of the user
> namespace and all its associated sysfs files will also be owned by the
> root user of the corresponding user namespace.
> If such a network device has to be moved back to the host network
> namespace the permissions will still be set to the root user of the
> owning user namespaces of the originating network namespace. This means
> unprivileged users can e.g. re-trigger uevents for such incorrectly
> owned devices on the host or in other network namespaces. They can also
> modify the settings of the device itself through sysfs when they
> wouldn't be able to do the same through netlink. Both of these things
> are unwanted.
>
> For example, quite a few workloads will create network devices in the
> host network namespace. Other tools will then proceed to move such
> devices between network namespaces owner by other user namespaces. While
> the ownership of the device itself is updated in
> net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
> entry for the device is not. Below you'll find that moving a network
> device (here a veth device) from a network namespace into another
> network namespaces owned by a different user namespace with a different
> id mapping. As you can see the permissions are wrong even though it is
> owned by the userns root user after it has been moved and can be
> interacted with through netlink:
>
> drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 .
> drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
> drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
> drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
> drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics
> lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
> -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
> -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
>
> Constrast this with creating a device of the same type in the network
> namespace directly. In this case the device's sysfs permissions will be
> correctly updated.
> (Please also note, that in a lot of workloads this strategy of creating
> the network device directly in the network device to workaround this
> issue can not be used. Either because the network device is dedicated
> after it has been created or because it used by a process that is
> heavily sandboxed and couldn't create network devices itself.):
>
> drwxr-xr-x 5 root root 0 Jan 25 18:12 .
> drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
> -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type
> -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len
> -r--r--r-- 1 root root 4096 Jan 25 18:12 address
> -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier
> -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes
> -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count
> -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count
> -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id
> -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port
> -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant
> -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias
> -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex
> -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink
> -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu
> -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group
> -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate
> -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id
> -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name
> -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id
> drwxr-xr-x 2 root root 0 Jan 25 18:12 power
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down
> drwxr-xr-x 4 root root 0 Jan 25 18:12 queues
> -r--r--r-- 1 root root 4096 Jan 25 18:12 speed
> drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics
> lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len
> -r--r--r-- 1 root root 4096 Jan 25 18:12 type
> -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent
>
> Now, when creating a network device in a network namespace owned by a
> user namespace and moving it to the host the permissions will be set to
> the id that the user namespace root user has been mapped to on the host
> leading to all sorts of permission issues mentioned above:
>
> 458752
> drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 .
> drwxr-xr-x 9 root root 0 Jan 25 18:08 ..
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id
> drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down
> drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed
> drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics
> lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len
> -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type
> -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent
>
> Fix this by changing the basic sysfs files associated with network
> devices when moving them between network namespaces. To this end we add
> some infrastructure to sysfs.
>
> The patchset takes care to only do this when the owning user namespaces
> changes and the kids differ. So there's only a performance overhead,
> when the owning user namespace of the network namespace is different
> __and__ the kid mappings for the root user are different for the two
> user namespaces:
> Assume we have a netdev eth0 which we create in netns1 owned by userns1.
> userns1 has an id mapping of 0 100000 100000. Now we move eth0 into
> netns2 which is owned by userns2 which also defines an id mapping of 0
> 100000 100000. In this case sysfs doesn't need updating. The patch will
> handle this case and not do any needless work. Now assume eth0 is moved
> into netns3 which is owned by userns3 which defines an id mapping of 0
> 123456 65536. In this case the root user in each namespace corresponds
> to different kid and sysfs needs updating.
>
> Thanks!
> Christian
>
> Christian Brauner (9):
> sysfs: add sysfs_file_change_owner_by_name()

I just realized that the commit message here somehow slipped through
the cracks and still references the old name of the function.

> sysfs: add sysfs_link_change_owner()
> sysfs: add sysfs_group{s}_change_owner()
> sysfs: add sysfs_change_owner()
> device: add device_change_owner()
> drivers/base/power: add dpm_sysfs_change_owner()

Also gives me a reason to make the dpm helper inline.

Sorry, will resend in a bit.

Christian