Re: [PATCH] fs: add fincore(2) (mincore(2) for file descriptors)

From: Steve VanDeBogart
Date: Thu Jan 28 2010 - 02:42:54 EST


On Tue, 26 Jan 2010, Andrew Morton wrote:

On Wed, 20 Jan 2010 13:57:12 -0800
Chris Frost <frost@xxxxxxxxxxx> wrote:

In this patch find_get_page() is called for each page, which in turn
calls rcu_read_lock(), for each page. We have found that amortizing
these RCU calls, e.g., by introducing a variant of find_get_pages_contig()
that does not skip missing pages, can speedup the above microbenchmark
by 260x when querying many pages per system call. But we have not observed
noticeable improvements to our macrobenchmarks. I'd be happy to also post
this change or look further into it, but this seems like a reasonable
first patch, at least.

I must say, the syscall appeals to my inner geek. Lot of applications
are leaving a lot of time on the floor due to bad disk access patterns.
A really smart library which uses this facility could help all over
the place.

Is it likely that these changes to SQLite and Gimp would be merged into
the upstream applications?

Changes to the GIMP fit nicely into the code structure, so it's feasible
to push this kind of optimization upstream. The changes in SQLite are
a bit more focused on the benchmark, but a more general approach is not
conceptually difficult. SQLite may not want the added complexity, but
other database may be interested in the performance improvement.

Of course, these kernel changes are needed before any application can
optimize its IO as we did with libprefetch.

+ if (pgoff >= file_npages || pgend > file_npages) {
+ retval = -EINVAL;
+ goto done;
+ }

Should this return -EINVAL, or should it just return "0": nothing there?

Bear in mind that this code is racy against truncate (I think?), and
this is "by design". If that race does occur, we want to return
something graceful to userspace and I suggest that "nope, nothing
there" is a more graceful result that "erk, you screwed up". Because
the application _didn't_ screw up: the pages were there when the
syscall was first performed.

That's a good point. Not in core seems like the right answer for pgoff >= file_npages.

--
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/