Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9]

From: Andy Lutomirski
Date: Thu Jul 12 2018 - 12:23:29 EST




> On Jul 12, 2018, at 7:54 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>>> On Jul 11, 2018, at 12:22 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
>>>
>>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>
>>>>> sfd = fsopen("ext4", FSOPEN_CLOEXEC);
>>>>> write(sfd, "s /dev/sdb1"); // note I'm ignoring write's length arg
>>>>
>>>> Imagine some malicious program passes sfd as stdout to a setuid
>>>> program. That program gets persuaded to write "s /etc/shadow". What
>>>> happens? Youâre okay as long as *every single fs* gets it right, but
>>>> thatâs asking a lot.
>>>
>>> Do note that you must already have CAP_SYS_ADMIN to be able to call
>>> fsopen().
>>
>> If you're not allowing it already, someone will want user namespace
>> root to be able to use this very, very soon.
>
> Yeah, I'm sure. And I've been thinking on how to deal with it.
>
> I think we *have* to open the source files/devices with the creds of whoever
> called fsopen() or fspick() - that way you can't upgrade your privs by passing
> your context fd to a suid program. To enforce this, I think it's simplest for
> fscontext_write() to call override_creds() right after taking the uapi_mutex
> and then call revert_creds() right before dropping the mutex.
>

If you make a syscall that attaches a block device to an fscontext, you donât need any of this. Heck, someone might actually *want* to grab a block device from a different namespace.

All this override_creds() stuff is maybe okay if we were fixing an old broken thing. But this is brand new. And having write() call override_creds() and do nontrivial things is a fascinating attack surface.

Just imagine what blows up if I abuse fscontext to open a block device on a path that traverses an AFS mount or /proc/.../fd or similar. Or if I splice() from a network filesystem into fscontext.

(Al- canât we just stop allowing splice() at all on things that donât use iov_iter?)

> Another thing we might want to look at is to allow a supervisory process to
> examine the context before permitting the create/reconfigure action to
> proceed. It might also be possible to do this through the LSM.

Cc Tycho. Heâs working on this exact idea using seccomp. And heâd probably much, much prefer if configuration of an fscontext didnât use a performance-critical syscall like write().

As a straw man, I suggest:

fsconfigure(contextfd, ADD_BLOCKDEV, dfd, path, flags);

fsconfigure(contextfd, ADD_OPTION, 0, âfoo=barâ, flags);

Etc.