Re: [PATCH RFC] mm: account VMA before forced-COW via/proc/pid/mem

From: Hugh Dickins
Date: Mon Apr 09 2012 - 20:34:49 EST


On Sat, 7 Apr 2012, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> >
> > I've long detested that behaviour of GUP write,force, and my strong
> > preference would be not to layer more strangeness upon strangeness,
> > but limit the damage by making GUP write,force fail in that case,
> > instead of inserting a PageAnon page into a VM_SHARED mapping.
> >
> > I think it's unlikely that it will cause a regression in real life
> > (it already fails if you did not open the mmap'ed file for writing),
> > but it would be a user-visible change in behaviour, and I've research
> > to do before arriving at a conclusion.
>
> Agree, but this stuff is very weak. Even if sysctl vm.overcommit_memory=2,
> probably we should fixup accounting in /proc/pid/mem only for this case,
> because vm.overcommit_memory=2 supposed to protect against overcommit, but it
> does not.

You are right (and it's not the first time I've had to say so!).

At first I was puzzled by your answer, then went back to your initial
mail, which clearly says "Currently kernel does not account read-only
private mappings into memory commitment. But these mappings can be
force-COW-ed in get_user_pages()" and realized (a) that I'd forgotten
about that weakness in the overcommit stuff, and (b) that I'd therefore
let my obsession with the anon-in-shared issue blind me to what you were
actually saying. "private", yes, sorry about that. Let's set aside the
shared issue for the moment, then, though I'll need to answer Oleg after
(and writing to /proc/pid/mem makes that more serious than before too).

Yes, GUP force-write into private-readonly can violate no-overcommit.

Force-write was originally intended just for setting breakpoints with
ptrace(2), and we've taken the view that fixing the no-overcommit issue
is simply more trouble than it's worth. But I agree with you that once
writing to /proc/pid/mem was enabled a year ago, it became a more serious
defect: I don't think /proc/pid/mem makes anything new possible, but it
does make it easier - I imagine dd(1) could achieve the same as your
little program.

I was hoping to find that writing to /proc/pid/mem was enabled solely
because "why not?", and the 198214a7ee commit message does suggest so;
but I see there was a 0/12 mail which never reached git, which makes
clear that it was intended for debuggers to use instead of ptrace(2).
So I think my first reaction, to disallow write-force on readonly via
/proc/pid/mem, would not be helpful.

I think all solutions to this are unsatifactory, and most ugly.
I'd better pull up your original mail to review in detail, but I'd
say yours is no exception: ugly and unsatisfactory, but quite possibly
less ugly and less unsatisfactory than most. I was quite happy to be
doing nothing at all about this, but now that you've raised the matter,
I can understand that you won't want it to rest there.

Of course, the issue can be dealt with by an additional data structure;
but extending vm_area_struct or anon_vma (if only by one usually-NULL
pointer), and adding the code to handle it, would itself be
unsatisfactory - and I guess you felt the same.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/