Re: [PATCH] Fix OOPS in mmap_region() when merging adjacentVM_LOCKED file segments

From: Lee Schermerhorn
Date: Fri Jan 30 2009 - 16:37:07 EST


On Fri, 2009-01-30 at 21:12 +0000, Hugh Dickins wrote:
> On Fri, 30 Jan 2009, Linus Torvalds wrote:
> > On Fri, 30 Jan 2009, Lee Schermerhorn wrote:
> > >
> > > So happens, I'm mapping with MAP_SHARED, so the VM_ACCOUNT flag gets
> > > cleared later in mmap_region(). Comments say that this is for checking
> > > memory availability during shmem_file_setup(). Maybe we can move the
> > > temporary setting of VM_ACCOUNT until just before the call to
> > > shmem_zero_setup()?
> >
> > Yeah, that would probably fix it, and looks like the right thing to do.
>
> I do need to refresh my memory on that in a moment...
>
> >
> > It all looks pretty confused wrong to set the whole VM_ACCOUNT flag for a
> > file-backed file AT ALL in the first place, but the code knows that it
> > won't matter for a shared file, and will be cleared again later.
> >
> > So it plays these temporary games with vm_flags, and it didn't matter
> > because of how we used to call "vma_merge()" either early only for the
> > anonymous memory case (that had VM_ACCOUNT stable and didn't have that
> > temporary case at all) or much later (after having undone the temporary
> > flag setting) for files.
>
> I'm to blame for those games, and now they've given trouble,
> the right thing may be to put an end to them.
>
> >
> > Why do we pass in that "accountable" flag, btw? It's only ever set to 0 by
> > a MAP_PRIVATE mapping that hits is_file_hugepages() (see do_mmap_pgoff),
> > and we could just do that decision all inside mmap_region(). So the flag
> > doesn't really seem to have any real meaning, and is just passed around
> > for some odd historical reason?
>
> It looks like the "accountable" flag dates from before Miklos separated
> mmap_region() out from do_mmap_pgoff(): so he just passed it on down to
> mmap_region() as an additional argument, preferring to leave the more
> complex MAP_PRIVATE/is_file_hugepages test behind in do_mmap_pgoff().
>
> It seemed rather a random refactoring to me. Looking at it again,
> I wonder if we should be getting do_brk() to use mmap_region() too;
> but my appetite for cleanups is low at present, bugs we have enough.
>
> By the way, there's an argument to say that you should add
> VM_MIXEDMAP to VM_CAN_NONLINEAR in VM_MERGEABLE_FLAGS: I don't
> really care whether we merge the odd filemap_xip vma or not,
> but it used to do so and now won't.
>
> By the same (used to merge, now won't) argument, one could say
> VM_INSERTPAGE should be there too; but whereas VM_MIXEDMAP is used
> in one place only, quite a lot of drivers use vm_insert_page(), so
> I feel more comfortable with the idea that it's stopping merges -
> though in that case, shouldn't we add it to VM_SPECIAL?
>
> But I'm caring more about that VM_ACCOUNT...

I just verified that adding VM_ACCOUNT to VM_MERGEABLE does allow the
merge to happen with the test program. And the system didn't come
crashing down around me. But, I wouldn't trust that simple test as the
last word. A short run of a stress load I use held up/still running,
but I can't tell whether it's merging as expected there.

I am running a slightly modified version of Maksim's test program under
the harness. I modified it to mmap the entire region to reserve space,
then MAP_FIXED at each page address in the range returned by the first
mmap. I saw that it was leaving holes between some of the pages w/o
this. I'm going to automate the check for merging [read map and verify
a single segment at expected range] and leave that running with the
load.

Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/