Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

From: Dave Chinner
Date: Thu Jan 10 2019 - 20:47:40 EST


On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote:
> Since direct IO has been brought up, I have a question. I've wondered
> for years why direct IO works the way it does. If I were implementing
> it from scratch, my first inclination would be to use the page cache
> instead of fighting it. To do a single-page direct read, I would look
> that page up in the page cache (i.e. i_pages these days). If the page
> is there, I would do a normal buffered read. If the page is not

Therein lies the problem. Copying data is prohibitively expensive,
and that's the primary reason for O_DIRECT existing. i.e. O_DIRECT
is a low-overhead, zero-copy data movement interface.

The moment we switch from using CPU to dispatch IO to copying data,
performance goes down because we will be unable to keep storage
pipelines full. IOWs, any rework of O_DIRECT that involves copying
data is a non-starter.

But let's bring this back to the issue at hand - observability of
page cache residency of file pages. If th epage is caceh resident,
then it will have a latency of copying that data out of the page
(i.e. very low latency). If the page is not resident, then it will
do IO and take much, much longer to complete. i.e. we have clear
timing differences between cachce hit and cache miss IO. This is
exactly the timing information needed for observing page cache
residency.

We need to work out how to make page cache residency less
observable, not add new, near perfect observation mechanisms that
third parties can easily exploit...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx