Re: [PATCH 10 of 20] ipath - support for userspace apps using coredriver

From: Bryan O'Sullivan
Date: Thu Mar 16 2006 - 10:30:52 EST


On Thu, 2006-03-16 at 14:24 +0000, Hugh Dickins wrote:

> I think Bryan's problem is probably that he has allocated a
> high-order page (via pci_alloc_consistent or dma_alloc_coherent),
> and is then exposing the constituents of that high-order page to
> userspace via nopage: when the final munmap or exit comes,
> the constituent pages get freed without realizing they're
> supposed to be part of one high-order page.

Yes, that sounds about right. Some of the allocations in question are
order-1, and some are higher.

> We had the same problem in sound/core/memalloc.c, and fixed it
> by using a compound page (a high-order page in which each constituent
> knows it's part of the larger whole): see the two uses of __GFP_COMP
> in that file, and in particular the one where snd_malloc_dev_pages
> calls dma_alloc_coherent.

Will do, thanks. Will the allocator give me a compound page if I ask
for a single page but pass __GFP_COMP, or do I need to be sure I always
make a higher-order request if I pass that flag?

> No. That isn't how VM_DONTCOPY behaves, and I don't know what you were
> thinking to bring it into the discussion there - if you have pages which
> are supposed to be shared across forks, that happens without any special
> flag, doesn't it? We only teach VM_DONTCOPY in the second year,
> I think Bryan would best forget it for now ;)

Heh. Actually, in this case, I really don't want the vma to get shared
with a child. If the child gets a SEGV after a fork, that's intended
behaviour, as far as I'm concerned.

> RDMA
> is finding it a useful flag to get around a peculiarly frustrating issue
> with get_user_pages: though that successfully pins pages, a fork followed
> by parent writing to pinned private page will put a _copy_ of that page
> into the parent's userspace (if child holds a reference to the original),
> which likely defeats the purpose of pinning.

That's a useful aside, thanks. It sounds like I want to set VM_DONTCOPY
on the pages I'm doing get_user_pages on too, then.

> He shouldn't set VM_SHM, it means nothing.

Is that the case for kernels prior to 2.6.15, too?

The problem is that I need to keep a version of the driver working on
kernels back as far as 2.6.9, in addition to the current mainline. If
VM_SHM meant nothing back then, too, I can remove it entirely, otherwise
I should drop it only in the for-mainline driver, and leave the backport
version as it stands.

> If he's dealing with real
> allocated pages, he shouldn't set VM_IO (but remap_pfn_range will set
> it for him if it's used). He shouldn't set VM_RESERVED (won't do any
> harm, but we're quite likely to remove it soon).

Yes, I actually dropped both of these last night.

> He shouldn't set VM_LOCKED (it would make the locked_vm accounting
> go wrong - unless he's careful to participate in that accounting,
> as Infiniband's uverbs_mem.c creditably does).

Well, uverbs_mem.c only touches locked_vm for pages it's pinned using
get_free_pages. We do much the same thing in that case.

I'll try turning off VM_LOCKED for the pages we allocate, and see what
breaks :-)

<b

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/