Re: [PATCH] munmap() don't check sysctl_max_mapcount

From: KAMEZAWA Hiroyuki
Date: Mon Oct 12 2009 - 20:57:44 EST


On Mon, 12 Oct 2009 16:04:08 +0100 (BST)
Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx> wrote:

> On Mon, 12 Oct 2009, KOSAKI Motohiro wrote:
> > And, I doubt I haven't catch your mention. May I ask some question?
> > Honestly I don't think max_map_count is important knob. it is strange
> > mutant of limit of virtual address space in the process.
> > At very long time ago (probably the stone age), linux doesn't have
> > vma rb_tree handling, then many vma directly cause find_vma slow down.
> > However current linux have good scalability. it can handle many vma issue.
>
> I think there are probably several different reasons for the limit,
> some perhaps buried in prehistory, yes, and others forgotten.
>
> One reason is well-known to your colleague, KAMEZAWA-san:
> the ELF core dump format only supports a ushort number of sections.
>
yes.

> One reason will be to limit the amount of kernel memory which can
> be pinned by a user program - why limit their ability to to lock down
> user pages, if we let them run wild with kernel data structures?
> The more important on 32-bit machines with more than 1GB of memory, as
> the lowmem restriction comes to bite. But I probably should not have
> mentioned that, I fear you'll now go on a hunt for other places where
> we impose no such limit, and embarrass me greatly with the result ;)
>
> And one reason will be the long vma->vm_next searches: less of an
> issue nowadays, yes, and preemptible if you have CONFIG_PREEMPT=y;
> but still might be something of a problem.
>
> > So, Why do you think max_mapcount sould be strictly keeped?
>
> I don't believe it's the most serious limit we have, and I'm no
> expert on its origins; but I do believe that if we profess to have
> some limit, then we have to enforce it. If we're going to allow
> anybody to get around the limit, better just throw the limit away.
>
> >
> > Honestly, I doubt nobody suffer from removing sysctl_max_mapcount.
>
> I expect Kame to disagree with you on that.
>
> >
> > And yes, stack unmapping have exceptional charactatics. the guard zone
> > gurantee it never raise map_count.
> > So, I think the attached patch (0001-Don-t...) is the same as you talked about, right?
>
> Yes, I've not tested but that looks right to me (I did have to think a
> bit to realize that the case where the munmap spans more than one vma
> is fine with the check you've added). In the version below I've just
> changed your code comment.
>
> > I can accept it. I haven't test it on ia64. however, at least it works
> > well on x86.
> >
> > BUT, I still think kernel souldn't refuse any resource deallocation.
> > otherwise, people discourage proper resource deallocation and encourage
> > brutal intentional memory leak programming style. What do you think?
>
> I think you're a little too trusting. It's common enough that in order
> to free one resource, we need just a little of another resource; and
> it is frustrating when that other resource is tightly limited. But if
> somebody owes you 10000 yen, and asks to borrow just another 1000 yen
> to make some arrangement to pay you back, then the next day asks to
> borrow just another 1000 yen to enhance that arrangement, then....
>
> That's what I'm asking to guard against here. But if you're so
> strongly against having that limit, please just get your customers
> to raise it to INT_MAX: that should be enough to keep away from
> its practical limitations, shouldn't it?
>
>
I discussed with Kosaki. Ah, hmm, reporing our status.

- Even if we think the program which exceeds max_map_count and go abort()
as buggy program, we don't think abort() (in library) is very good.
So, we want to avoid this.

- We hear one of our collegue (debugger team) is now preparing ELF-extention
patches for kernel and gdb. We hear solaris has ELF-extention for handling more
than 65535 program headers and recent AMD64 ABI draft includes it.
We now think this extention should go first. We discuss him with our schedule.

- Considering "too much consume memory" attack, we need some limits.
Then, we wonder adding
- system-wide max_map_count (enough large)
or
- determine per process max_map_count based on host's memory size.

BTW, looking sysctl, there is threads-max.

[kamezawa@bluextal ~]$ cat /proc/sys/kernel/threads-max
409600

This number is system-wide and automatically determined at boot.
But, in fact, there is max_map_count and per process threads-max is determined
by it. We think this not very neat.

We'll consider more. Probably, we'll start from ELF extention.


Thanks,
-Kame









--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/