Re: [RFCv2][PATCH 1/2] fs proc: make pagemap a privileged interface

From: Eric W. Biederman
Date: Tue Mar 17 2015 - 14:28:13 EST


Dave Hansen <dave@xxxxxxxx> writes:

> On 03/17/2015 06:04 AM, Eric W. Biederman wrote:
>> Dave Hansen <dave@xxxxxxxx> writes:
>>
>>> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>>>
>>> Changes from v1:
>>> * Do not allow a child pid namespace to unset paranoid
>>> when its parent had it set.
>>> * Update description text to clarify the options we
>>> have to solve this problem.
>>
>> Again.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>
>> The option name "paranoid" is entirely too general. Who knows what
>> it referrs to.
>
> /proc exposes a lot of sensitive information that could be used to
> determine physical memory ordering (not as bad as actual physical
> addresses, but still). My hope was that instead of adding an option for
> every single proc file, we'd have a single one that folks could turn on.
>
> In the end, I'm perfectly fine doing a s/paranoid/pidpagemap/g, I just
> wanted to point out that there will may be more patches like this down
> the line.
>
>> A mount option is not an appropriate place to control one small bit of
>> policy like this. Proc mount options are a real pain in the butt to
>> deal with and to maintain.
>
> It sounds like you are of the opinion that this commit:
>
>> commit 0499680a42141d86417a8fbaa8c8db806bea1201
>> Author: Vasiliy Kulikov <segooon@xxxxxxxxx>
>> Date: Tue Jan 10 15:11:31 2012 -0800
>>
>> procfs: add hidepid= and gid= mount options
>
> was inappropriate.

Actually I am. It is a bloody nightmare to maintain. But at least it
deals with the core business of proc. That of displaying processes.

>> Further a per pid namespace decision does not actually work, for having
>> restricted policy only for a small set of processes because it is only
>> with very careful container setup that you would expose this policy.
>
> I would hope that the folks doing the fancy container setup tools would
> add this when they mount the container /proc and care about exposing
> physical addresses to it.
>
> I did model this after the _existing_ /proc options (introduced in the
> commit referenced above). Those also use the pid namespace to store
> mount options. I assumed they are used out in the real world and that
> they do not require any kind of careful container setup.

The use cases are enough different it is different. For the hidepid
non-sense a small leak doesn't give you much.

For the case of restricting pagemap. In many cases even a single leak
of a single value means you can infer everything else you want to know.
Since a single leak of pagemap gives the game away something that does
not restrict on a large basis is a problem.

>> If you really need a subset of processes with a restricted policy make
>> it a prctl, and bloat struct task. Then disallow a process with the
>> prctl set from reading the file.
>
> Let's say we add the prctl(), and we set it up to block
> /proc/$pid/pagemap by default at boot. We run for a couple of weeks and
> an (unprivileged) app breaks. With the mount option, an administrator
> at least has the option to fall back to a less secure mode for the whole
> system with a remount.
>
> With a prctl(), don't think that would be feasible, short of a reboot.
>
> Would such a prctl() also have the feature that it could never be set to
> a less-restrictive policy?

Your choice. For system wide behavior a sysctl or a boot option are
likely better.

For the case of just enabling this for a pid namespace or a container
prctl() seems reasonable.

I am a bit puzzled though. I though as part of the kernel virtual
address randomization effort we had a bunch of similar patches come
through that I could refer you to. But for whatever reason I am not
seeing them in the source tree now. If those kinds of changes actually
exist that class of blinding might be useful for your changes as well.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/