Re: [PATCH 03/11] fs: add new read_uptr and write_uptr file operations

From: Christoph Hellwig
Date: Mon Jun 29 2020 - 15:07:02 EST


On Sat, Jun 27, 2020 at 09:33:03AM -0700, Linus Torvalds wrote:
> I thought there was just one very specific case of "oh, in certain
> cases of setsockopt we don't know what size this address is and optlen
> is ignored", so we have to just pass the pointer down to the protocol,
> which is the point that knows how much of an address it wants..

The setsock issue is a little more complicated. Let me try to summarize
it:

- setsock takes a (user) pointer and len
- unfortunately while the designed of the BSD socket API designed the
len to be correct some protocol implementations have been sloppy
and just use a hardcoded len for the value plus some other funnies
- unfortunately there is some BPF magic that can attach to a socket
and be run, and that (and only that in the latest kernel) can cause
a setsockopt to take a kernel buffer. One that was copied from
userspace earlier and had the BPF program run on it.
- unfortunately we have about 90 ->setsockopt instances, and the BPF
hook is not specific to one particular of them. In fact the
BPF program can run for options that don't even exist, and based on
my previous dicussion Facebook has setups that rely on that.

> Was that a misunderstanding on my part?
>
> Because if there are tons and tons of places that want this "either
> kernel or user" then we could still have a helper function for it, but
> it means that the whole "limit the cases" advantage to some degree
> goes away.

But except for setsockopt we don't really have anything like that left.
There is some alpha arch code that would need to be duplicated for
user vs kernel pointers, but I suspect it will get cleaner by that,
and the messy s390 crypto driver whÑch will be a bit of work, but all
internal to that driver.

So based on that I'd rather get away without our flag and tag the
kernel pointer case in setsockopt explicitly.