Re: [PATCH 29/38] vfs: syscall: Add fsconfig() for configuring and managing a context [ver #10]

From: Jann Horn
Date: Sun Jul 29 2018 - 07:14:53 EST


On Sun, Jul 29, 2018 at 10:50 AM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> > [...]
> > > + case fsconfig_set_binary:
> > > + param.type = fs_value_is_blob;
> > > + param.size = aux;
> > > + param.blob = memdup_user_nul(_value, aux);
> > > + if (IS_ERR(param.blob)) {
> > > + ret = PTR_ERR(param.blob);
> > > + goto out_key;
> > > + }
> > > + break;
> >
> > This means that a namespace admin (iow, an unprivileged user) can
> > allocate 1MB of unswappable kmalloc memory per userspace task, right?
> > Using userfaultfd or FUSE, you can then stall the task as long as you
> > want while it has that allocation. Is that problematic, or is that
> > normal?
>
> That's not exactly the case. A userspace task can make a temporary
> allocation, but unless the filesystem grabs it, it's released again on exit
> from the system call.

That's what I said. Each userspace task can make a 1MB allocation by
calling this syscall, and this temporary allocation stays allocated
until the end of the syscall. But the runtime of the syscall is
unbounded - even just the memdup_user_nul() can stall forever if the
copy_from_user() call inside it faults on e.g. a userfault region or a
memory-mapped file from a FUSE filesystem.

> Note that I should probably use vmalloc() rather than kmalloc(), but that
> doesn't really affect your point. I could also pass the user pointer through
> to the filesystem instead - I wanted to avoid that for this interface, but it
> make sense in this instance.