Re: [PATCH] binderfs: implement "max" mount option

From: Christian Brauner
Date: Sun Dec 23 2018 - 08:06:28 EST


On Sun, Dec 23, 2018 at 12:29:44PM +0100, Greg KH wrote:
> On Sat, Dec 22, 2018 at 10:18:06PM +0100, Christian Brauner wrote:
> > Since binderfs can be mounted by userns root in non-initial user namespaces
> > some precautions are in order. First, a way to set a maximum on the number
> > of binder devices that can be allocated per binderfs instance and second, a
> > way to reserve a reasonable chunk of binderfs devices for the initial ipc
> > namespace.
> > A first approach as seen in [1] used sysctls similiar to devpts but was
> > shown to be flawed (cf. [2] and [3]) since some aspects were unneeded. This
> > is an alternative approach which avoids sysctls completely and instead
> > switches to a single mount option.
> >
> > Starting with this commit binderfs instances can be mounted with a limit on
> > the number of binder devices that can be allocated. The max=<count> mount
> > option serves as a per-instance limit. If max=<count> is set then only
> > <count> number of binder devices can be allocated in this binderfs
> > instance.
>
> Ok, this is fine, but why such a big default? You only need 4 to run a
> modern android system, and anyone using binder outside of android is
> really too crazy to ever be using it in a container :)
>
> > Additionally, the binderfs instance in the initial ipc namespace will
> > always have a reserve of at least 1024 binder devices unless explicitly
> > capped via max=<count>.
>
> Again, why so many? And why wouldn't that initial ipc namespace already
> have their device nodes created _before_ anything else is mounted?

Right, my issue is with re-creating devices, like if binderfs gets
unmounted or if devices get removed via rm. But we can lower the number
to 4 (see below).

>
> Some comments on the patch below:

Thanks!

>
> > +/*
> > + * Ensure that the initial ipc namespace always has a good chunk of devices
> > + * available.
> > + */
> > +#define BINDERFS_MAX_MINOR_CAPPED (BINDERFS_MAX_MINOR - 1024)
>
> Again that seems crazy big, how about splitting this into two different
> patches, one for the max= stuff, and one for this "reserve some minors"
> thing, so we can review them separately.

Yes, let's do that. I will also lower this to 4 reserved devices.

>
> >
> > static struct vfsmount *binderfs_mnt;
> >
> > @@ -46,6 +52,24 @@ static dev_t binderfs_dev;
> > static DEFINE_MUTEX(binderfs_minors_mutex);
> > static DEFINE_IDA(binderfs_minors);
> >
> > +/**
> > + * binderfs_mount_opts - mount options for binderfs
> > + * @max: maximum number of allocatable binderfs binder devices
> > + */
> > +struct binderfs_mount_opts {
> > + int max;
> > +};
> > +
> > +enum {
> > + Opt_max,
> > + Opt_err
> > +};
> > +
> > +static const match_table_t tokens = {
> > + { Opt_max, "max=%d" },
> > + { Opt_err, NULL }
> > +};
> > +
> > /**
> > * binderfs_info - information about a binderfs mount
> > * @ipc_ns: The ipc namespace the binderfs mount belongs to.
> > @@ -55,13 +79,16 @@ static DEFINE_IDA(binderfs_minors);
> > * created.
> > * @root_gid: gid that needs to be used when a new binder device is
> > * created.
> > + * @mount_opts: The mount options in use.
> > + * @device_count: The current number of allocated binder devices.
> > */
> > struct binderfs_info {
> > struct ipc_namespace *ipc_ns;
> > struct dentry *control_dentry;
> > kuid_t root_uid;
> > kgid_t root_gid;
> > -
> > + struct binderfs_mount_opts mount_opts;
> > + atomic_t device_count;
>
> Why atomic?
>
> You should already have the lock held every time this is accessed,
> so no need to use an atomic value, just use an int.
>
> > /* Reserve new minor number for the new device. */
> > mutex_lock(&binderfs_minors_mutex);
> > - minor = ida_alloc_max(&binderfs_minors, BINDERFS_MAX_MINOR, GFP_KERNEL);
> > + if (atomic_inc_return(&info->device_count) < info->mount_opts.max)
>
> No need for atomic, see, your lock is held :)

Habit, to be honest.

Thanks, fixed version to follow in a bit.
Christian