Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

From: David Howells
Date: Thu Jul 12 2018 - 16:23:16 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Don't play games with override_creds. It's wrong.
>
> You have to use file->f_creds - no games, no garbage.

You missed the point.

It's all very well to say "use file->f_creds". The problem is this has to be
handed down all the way through the filesystem and down into the block layer
as appropriate to anywhere there's an LSM call, a CAP_* check or a pathwalk -
but there's not currently any way to do that.

mount_bdev() and blkdev_get_by_path() are examples of this. At the moment
there is no cred parameter there. We'd also have to pass the creds down into
path_init() to store in struct nameidata and make sure that every permissions
call that might be invoked during pathwalk in every filesystem uses that, not
current_cred().

I made an attempt to do this a while ago and the patch got rather large before
I gave up. In many ways, it's what we *should* do, but so many things need an
extra parameter... If you really want, I can try that again. It's possible I
can automate it with some perl scripting to parse the error messages from the
compiler.

My suggestion was to use override_creds() to impose the appropriate creds at
the top, be that file->f_creds or fs_context->creds (they would be the same in
any case).

If we want to go down the pass-the-creds-down route, then we can temporarily
do override_creds() until we've made the changes and then remove it later.

> But "write()" simply is *NOT* a good "command" interface. If you want
> to send a command, use an ioctl or a system call.

Okay.

> Because it's not just about credentials. It's not just about fooling a
> suid app into writing an error message to a descriptor you wrote. It's
> also about things like "splice()", which can write to your target
> using a kernel buffer, and thus trick you into doing a command while
> we have the context set to kernel addresses.
>
> Are we trying to get away from that issue? Yes. But it's just another
> example of why "write()" IS NOT TO BE USED FOR COMMANDS.

Btw, do we protect sysfs, debugfs, tracefs, procfs, etc. writes against
splice? Some of the things in debugfs are really icky, allowing you to muck
directly with hardware.

David