Re: statfs() / statvfs() syscall ballsup...

From: Joel Becker
Date: Fri Oct 10 2003 - 11:34:15 EST


On Fri, Oct 10, 2003 at 05:01:44PM +0100, Jamie Lokier wrote:
> Why don't you _share_ the App's cache with the kernel's? That's what
> mmap() and remap_file_pages() are for.

Because you can't force flush/read. You can't say "I need you
to go to disk for this." If you do, you're doing O_DIRECT through mmap
(yes, I've pondered it) and you end up with perhaps the same races folks
worry about. Doesn't mean it can't be done.

> That's tough to guarantee at the platter level regardless of O_DIRECT,
> but otherwise: you have fdatasync() and msync().

Platter level doesn't matter. Storage access level matters.
Node1 and Node2 have to see the same thing. As long as I am absolutely
sure that when Node1's write() returns, any subsequent read() on Node2
will see the change (normal barrier stuff, really), it doesn't matter
what happend on the Storage. The data could be in storage cache, on
platter, passed back to some other entity.

> Take a look at remap_file_pages() and write a note here to say if it
> fits the bill. I thought remap_file_pages() was added for Oracle, but
> perhaps it was for a more modern database ;)

remap_file_pages() was indeed somethign Oracle wanted, but as a
way to create 8GB shmfs files and map them into the x86 crappy address
space. It still does not have the ability to force reads and writes to
the storage, and it even has other issues.

Joel

--

Life's Little Instruction Book #511

"Call your mother."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@xxxxxxxxxx
Phone: (650) 506-8127
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/