Re: [PATCH v7 0/8] iov_iter: Improve page extraction (ref, pin or just list)

From: Matthew Wilcox
Date: Mon Jan 23 2023 - 11:43:40 EST


On Mon, Jan 23, 2023 at 04:38:47PM +0000, David Howells wrote:
> Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> > Why do we want to track that information on a per-page basis? Wouldn't it
> > be easier to have a VM_NOCOW flag in vma->vm_flags? Set it the first
> > time somebody does an O_DIRECT read or RDMA pin. That's it. Pages in
> > that VMA will now never be COWed, regardless of their refcount/mapcount.
> > And the whole "did we pin or get this page" problem goes away. Along
> > with folio->pincount.
>
> Wouldn't that potentially make someone's entire malloc() heap entirely NOCOW
> if they did a single DIO to/from it.

Yes. Would that be an actual problem for any real application?

We could do this with a vm_pincount if it's essential to be able to
count how often it's happened and be able to fork() without COW if it's
something that happened in the past and is now not happening.

> Also you only mention DIO read - but what about "start DIO write; fork(); touch
> buffer" in the parent - now the write buffer belongs to the child and they can
> affect the parent's write.

I'm struggling to see the problem here. If the child hasn't exec'd, the
parent and child are still in the same security domain. The parent
could have modified the buffer before calling fork().