Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

From: Linus Torvalds
Date: Thu Jan 10 2019 - 16:59:34 EST


On Thu, Jan 10, 2019 at 1:44 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> GUP does page fault on user buffer which is a mmapped region of same
> file. page fault sets up for buffered IO, tries to take rwsem for
> write, deadlocks.
>
> Most of the schemes we come up with fall down at this point - you
> can't hold a lock over gup that is also used in the buffered IO
> path. That's why XFS (and now ext4) have the IOLOCK and MMAPLOCK
> for truncation serialisation - we can't lock out both read()/write()
> and mmap IO paths with the same lock...

Side note: a somewhat similar version of is true even in the absence
of GUP and dio, for the case of doing a mmap of a file, and then
reading or writing from the mapped region into the file itself.

There are "interesting" locking scenarios wrt just holding the page
locked, and trying to then fill that page with information with just a
regular "copy_from_user()".

Page fault -> try to read the file -> oops, the page we're trying to
read from is locked because we're trying to write to it.

So we have that odd dance in generic_perform_write() which does

- touch the first user byte without holding any lock

- do write_begin() (which gets the page lock)

- copy from user space using the "atomic" copy (which just gives up on fault)

- if nothing got copied, go back and try again with a smaller copy
that can't cross a page. We might have raced with pageout.

It might be possible to do something similar for direct IO, although
simpler: just do the GUP entirely atomically (and in the fault case
just fall back to non-direct IO).

Linus