Re: [PATCH 5.4 033/222] io_uring: only allow submit from owning task

From: Jens Axboe
Date: Fri Jan 24 2020 - 16:41:48 EST


On 1/24/20 12:11 PM, Stefan Metzmacher wrote:
> Am 24.01.20 um 17:58 schrieb Jens Axboe:
>> On 1/24/20 3:38 AM, Stefan Metzmacher wrote:
>>> Am 22.01.20 um 10:26 schrieb Greg Kroah-Hartman:
>>>> From: Jens Axboe <axboe@xxxxxxxxx>
>>>>
>>>> commit 44d282796f81eb1debc1d7cb53245b4cb3214cb5 upstream.
>>>>
>>>> If the credentials or the mm doesn't match, don't allow the task to
>>>> submit anything on behalf of this ring. The task that owns the ring can
>>>> pass the file descriptor to another task, but we don't want to allow
>>>> that task to submit an SQE that then assumes the ring mm and creds if
>>>> it needs to go async.
>>>>
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Suggested-by: Stefan Metzmacher <metze@xxxxxxxxx>
>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>>>
>>>>
>>>> ---
>>>> fs/io_uring.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -3716,6 +3716,12 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
>>>> wake_up(&ctx->sqo_wait);
>>>> submitted = to_submit;
>>>> } else if (to_submit) {
>>>> + if (current->mm != ctx->sqo_mm ||
>>>> + current_cred() != ctx->creds) {
>>>> + ret = -EPERM;
>>>> + goto out;
>>>> + }
>>>> +
>>>
>>> I thought about this a bit more.
>>>
>>> I'm not sure if this is actually to restrictive,
>>> because it means applications like Samba won't
>>> be able to use io-uring at all.
>>>
>>> As even if current_cred() and ctx->creds describe the same
>>> set of uid,gids the != won't ever match again and
>>> makes the whole ring unuseable.
>>>
>>> I'm not sure about what the best short term solution could be...
>>>
>>> 1. May just doing the check for path based operations?
>>> and fail individual requests with EPERM.
>>>
>>> 2. Or force REQ_F_FORCE_ASYNC for path based operations,
>>> so that they're always executed from within the workqueue
>>> with were ctx->creds is active.
>>>
>>> 3. Or (as proposed earlier) do the override_creds/revert_creds dance
>>> (and similar for mm) if needed.
>>>
>>> To summaries the problem again:
>>>
>>> For path based operations like:
>>> - IORING_OP_CONNECT (maybe also - IORING_OP_ACCEPT???)
>>> - IORING_OP_SEND*, IORING_OP_RECV* on DGRAM sockets
>>> - IORING_OP_OPENAT, IORING_OP_STATX, IORING_OP_OPENAT2
>>> it's important under which current_cred they are called.
>>>
>>> Are IORING_OP_MADVISE, IORING_OP_FADVISE and IORING_OP_FALLOCATE
>>> are only bound to the credentials of the passed fd they operate on?
>>>
>>> The current assumption is that the io_uring_setup() syscall captures
>>> the current_cred() to ctx->cred and all operations on the ring
>>> are executed under the context of ctx->cred.
>>> Therefore all helper threads do the override_creds/revert_creds dance.
>>
>> But it doesn't - we're expecting them to match, and with this change,
>> we assert that it's the case or return -EPERM.
>>
>>> But the possible non-blocking line execution of operations in
>>> the io_uring_enter() syscall doesn't do the override_creds/revert_creds
>>> dance and execute the operations under current_cred().
>>>
>>> This means it's random depending on filled cached under what
>>> credentials an operation is executed.
>>>
>>> In order to prevent security problems the current patch is enough,
>>> but as outlined above it will make io-uring complete unuseable
>>> for applications using any syscall that changes current_cred().
>>>
>>> Change 1. would be a little bit better, but still not really useful.
>>>
>>> I'd actually prefer solution 3. as it's still possible to make
>>> use of non-blocking operations, while the security is the
>>> same as solution 2.
>>
>> For your situation, we need to extend it anyway, and provide a way
>> to swap between personalities. So yeah it won't work as-is for your
>> use case, but we can work on making that the case.
>
> That's only for the OPENAT2 case, which we might want to use in future,
> but there's a lot of work required to have async opens in Samba.
>
> But I have a experimental module that, just use READV, WRITEV and FSYNC
> with io-uring in order to avoid our userspace helper threads.
>
> And that won't work anymore with the change as Samba change
> current_cred() very often switch between (at least) 2 identities
> root and the user. That will change the pointer of current_cred() each time.
>
> I mean I could work around the check by using IORING_SETUP_SQPOLL,
> but I'd like to avoid that.

It's easy enough to support the current creds from io_uring_enter(),
where we need a bit of plumbing is if we have to go async for that
particular operation. We currently have that static as well, which is
why the current patch is needed.

What I'm trying to say is that'll we'll need code changes to support
this in any case, even just reverting that change isn't going to make
the problem go away for you.

Hence we just need to decide on what the best way to do this would be!

--
Jens Axboe