Re: Move to unshared VMAs in NOMMU mode?

From: Robin Getz
Date: Mon Mar 12 2007 - 17:50:42 EST


On Fri 9 Mar 2007 09:12, David Howells pondered:
> I've been considering how to deal with the SYSV SHM problem, and I think we
> may have to move to unshared VMAs in NOMMU mode to deal with this.

Thanks for putting some good thoughts down.

> Currently, what we have is each mm_struct has in its arch-specific context
> argument a list of VMLs. Take the FRV context for example:
>
> [include/asm-frv/mmu.h]
> typedef struct {
> #ifdef CONFIG_MMU
> ...
> struct vm_list_struct *vmlist;
> unsigned long end_brk;
>
> #endif
> ...
> } mm_context_t;
>
> Each VML struct containes a pointer to a systemwide VMA and the next VML in
> the list:
>
> struct vm_list_struct {
> struct vm_list_struct *next;
> struct vm_area_struct *vma;
> };
>
> The VMAs themselves are kept in an rb-tree in mm/nommu.c:
>
> /* list of shareable VMAs */
> struct rb_root nommu_vma_tree = RB_ROOT;
>
> which can then be displayed through /proc/maps.
>
> There are some restrictions of this system, mainly due to the NOMMU
> constraints:
>
> (*) mmap() may not be used to overlay one mapping upon another
>
> (*) mmap() may not be used with MAP_FIXED.
>
> (*) mmap()'s of the same part of the same file will result in multiple
> mappings returning the same base address, assuming the maps are
> shareable. If they aren't shareable, they'll be at different base
> addresses.
>
> (*) for normal shareable file mappings, two mappings will only be shared
> if they precisely match offset, size and protection, otherwise a new
> mapping will be created (this is because VMAs will be shared). Splitting
> VMAs would reduce the this restriction, though subsequent mappings would
> have to be bounded by the first mapping, but wouldn't have to be the same
> size.
>
> (*) munmap() may only unmap a precise match amongst the mappings made; it
> may not be used to cut down or punch a hole in an existing mapping.
>
> The VMAs for private file mappings, private blockdev mappings and anonymous
> mappings, be they shared[*] or unshared, hold a pointer to the kmalloc()'d
> region of memory in which the mapping contents reside. This region is
> discarded when the VMA is deleted. When a region can be shared the VMA is
> also shared, and so no reference counting need take place on the mapping
> contents as that is implied by the VMA.
>
> [*] MAP_PRIVATE+!PROT_WRITE+!PT_PTRACED regions may be shared
>
> Note that for mappable chardevs with special BDI capability flags, extra
> VMAs may be allocated because (a) they may need to overlap non-exactly, and
> (b) the chardev itself pins the backing storage, if the backing storage is
> potentially transient.
>
>
> If VMAs are not shared for shared memory regions then some other means of
> retaining the actual allocated memory region must be found. The obvious
> way to do this is to have the VMA point to a shared, refcounted record that
> keeps track of the region:
>
> struct vm_region {
> /* the first parameters define the region as for the VMA */
> pgprot_t vm_page_prot;
> unsigned long vm_start;
> unsigned long vm_end
> unsigned long vm_pgoff;
> struct file *vm_file;
>
> atomic_t vm_usage; /* region usage count */
> struct rb_node vm_rb; /* region tree */
> };
>
> The VMA itself would then have to be modified to include a pointer to this,
> but wouldn't then need its own refcount. VMAs would belong, once again, to
> the mm_struct, the VML struct would vanish, and the VML list rooted in
> mm_context_t would vanish.
>
> For R/O shareable file mappings, it might be possible to actually use the
> target file's pagecache for the mapping. I do something of that sort for
> shared-writable mappings on ramfs files (to support POSIX SHM and SYSV
> SHM).
>
> The downside of allocating all these extra VMAs is that, of course, it
> takes up more memory, though that may not be too bad, especially if it's at
> the gain of additional consistency with the MM code.

I guess I don't look at it as consistency with the MM code as being the
primary request, but consistency in operation with the MM code from a user
space perspective - hopefully the two goals are not divergent.

> However, consistency isn't for the most part a real issue. As I see it,
> drivers and filesystems should not concern themselves with anything other
> than the VMA they're given, and so it doesn't matter if these are shared or
> not.
>
> That brings us on to the problem with SYSV SHM which keeps an attachment
> count that the VMA mmap(), open() and release() ops manipulate. This means
> that the nattch count comes out wrong on NOMMU systems. Note that on MMU
> systems, doing a munmap() in the middle of an attached region will *also*
> break the nattch count, though this is self-correcting.
>
> Another way of dealing with the nattch count on NOMMU systems is to do it
> through the VML list, but that then needs more special casing in the SHM
> driver and perhaps others.

We (noMMU) folks need to have special code anyway - so why not put it there,
and try not to increase memory footprint?

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/