Re: Getting rid of dynamic TASK_SIZE (on x86, at least)

From: Cyrill Gorcunov
Date: Tue May 10 2016 - 13:49:25 EST


On Tue, May 10, 2016 at 10:26:05AM -0700, Andy Lutomirski wrote:
...
> >>
> >> It's annoying and ugly. It also makes the idea of doing 32-bit CRIU
> >> restore by starting in 64-bit mode and switching to 32-bit more
> >> complicated because it requires switching TASK_SIZE.
> >
> > Well, you know I'm not sure it's that annoying. It serves as it should
> > for task limit. Sure we can add one more parameter into get-unmapped-addr
> > but same time the task-size will be present in say page faulting code
> > (the helper might be renamed but it will be here still).
>
> Why should the page faulting code care at all what type of task it is?
> If there's a vma there, fault it in. If there isn't, then don't.

__bad_area_nosemaphore
...
/* Kernel addresses are always protection faults: */
if (address >= TASK_SIZE)
error_code |= PF_PROT;

For sure page faulting must consider what kind of fault is it.
Or we gonna drop such code at all?

> > Same applies
> > to arch_get_unmapped_area_topdown, should there be some argument
> > passed instead of open-coded TASK_SIZE helper?
> >
> > Don't get me wrong please, just trying to figure out how many code
> > places need to be patche if we start this procedure.
> >
> > As to starting restore in 64 bit and switch into 32 bit -- should
> > not we simply scan for "current" memory map and test if all areas
> > mapped belong to compat limit?
>
> I don't see what's wrong with leaving a high vma around. The task is
> unlikely to use it, but, if the task does use it (via long jump, for
> example), it'll worj.

True, from cpu perspective there is nothing wrong if in compat
(kernel compat) mode some memory slabs get left. Just thought
at first iteration we wanted unchanged behaviour.

> > And that's all. (Sorry I didn't
> > follow precisely on your and Dmitry's conversation so I quite
> > probably missing something obvious here).
>
> It's not all. We'd need an API to allow the task to cause TASK_SIZE
> to change from TASK_SIZE64 to TASK_SIZE32. I don't want to add that
> API because I think its sole purpose is to work around kernel
> silliness, and I'd rather we just fixed the silliness.

I implied the change of task-size. Anyway, I see what you mean, thanks
for clarification. Still I think we won't be able to completely
replace task-size with task-size-mask. Some places such as base
for elf-dynload use it as a part of api (not directly though),
and at least in load_elf_binary the choose of base address should
be preserved.