Re: statfs() / statvfs() syscall ballsup...

From: Andrea Arcangeli
Date: Fri Oct 10 2003 - 13:21:45 EST


On Fri, Oct 10, 2003 at 05:01:44PM +0100, Jamie Lokier wrote:
> Joel Becker wrote:
> > Where I work doesn't change the need for O_DIRECT. If your Big
> > App has it's own cache, why copy the cache in the kernel? That just
> > wastes RAM.
>
> Why don't you _share_ the App's cache with the kernel's? That's what
> mmap() and remap_file_pages() are for.

I covered this some time ago in the remap_file_pages threads with Wil.

remap_file_pages requires pte modifications and tlb flushes.

O_DIRECT only walk the pagetables, no pte mangling, no tlb flushes, the
TLB is preserved fully.

thinking only 64bit in the above of course, 32bit is different but still
mmap+remap_file_pages can't beat O_DIRECT if you dedicate your machine
for the database task.

> > If your app is sharing data, whether physical disk, logical
> > disk, or via some network filesystem or storage device, you must
> > absolutely guarantee that reads and writes hit the storage, not the
> > kernel cache which has no idea whether another node wrote an update or
> > needs a cache flush.
>
> That's tough to guarantee at the platter level regardless of O_DIRECT,
> but otherwise: you have fdatasync() and msync().
>
> > If Linux came up with a better, cleaner method, Oracle might change.
>
> Take a look at remap_file_pages() and write a note here to say if it
> fits the bill. I thought remap_file_pages() was added for Oracle, but
> perhaps it was for a more modern database ;)

no way, it has the disavantages I mentioned above, it would be a bad
idea to use remap_file_pages on any 64bit system out there.

we know remap_file_pages has a chance to improve the /dev/shm mappings
from 32bit systems, but that has nothing to do with the long run 64bit
machines, remap_file_pages is mostly a 32bit hack for ia32 with PAE.

About the in-memory databases, that's really the big iron
non-mass-market, not the other way around. only the big irons have
enough money to buy that much ram, you sure can't compare the price of
the ram with the price of disk, or at least not yet in this market AFIK.

Andrea - If you prefer relying on open source software, check these links:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/