Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)

From: Andy Lutomirski
Date: Tue Sep 20 2016 - 20:55:40 EST


On Tue, Sep 20, 2016 at 5:32 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
> On 09/20/16 17:22, H. Peter Anvin wrote:
>> The more I'm thinking about this, why don't we simply have these (the
>> various possible vdsos as well as vvar) as actual files in sysfs instead
>> of introducing a new filesystem? I don't believe sysfs actually has to
>> be mounted in order for sysfs files to have an inode.
>>
>> It could also be in procfs, I guess, but sysfs probably makes more sense.
>>
>> I'm thinking something like:
>>
>> /sys/kernel/vdso/{i386,x86_64,x32,vvar}
>>
>> Not only would this let the container people and so on do weird things
>> much easier, but it ought to eliminate a whole slew of special cases.
>>
>
> Even crazier idea: instead of a separate vvar file, have the vvar page
> just be a part of these files (as a shared page)... I'm wondering if we
> can even use load_elf_interp() since after all it is an ELF shared
> library image...

I think that may be too crazy:

- If vvar is in the same inode, then that inode won't be a valid ELF
image, because the ELF header won't be in the right place.

- vvar is highly magical. IMO letting it get mapped with VM_MAYWRITE
is asking for trouble, as anything that writes it will COW it, leading
to strange malfunctions.

- vvar can, and has, had IO pages in it. This means that the actual
cache types can vary page-to-page in the vvar area, which is not
something that ordinary files do.

Also, if we let the users get an fd pointing to the vdso, then we're
more or less committing to never having contents in the vdso text that
vary per-process. Are we okay with that.

Dmitry's patches have the vdso using the page cache, and I'm not sure
that even that is needed. I think that a file with no backing
address_space that simply provides vm_fault instead may be sufficient,
especially for vvar. I don't know if uprobes would be okay with that,
though.

My personal preference is to let them both be real struct file *
objects (possibly shared between all processes of the same vdso ABI)
but to prevent user code from ever creating an fd referring to one of
these files.

--Andy