Re: splice vs execve lockdep trace.

From: Linus Torvalds
Date: Wed Jul 17 2013 - 00:54:16 EST

Next message: Thierry Reding: "Re: [PATCH V3 3/4] ARM: dts: tegra: Correct PCIe entry"
Previous message: James Bottomley: "Re: [Ksummit-2013-discuss] [ATTEND] scsi-mq prototype discussion"
In reply to: Dave Chinner: "Re: splice vs execve lockdep trace."
Next in thread: Dave Chinner: "Re: splice vs execve lockdep trace."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jul 16, 2013 at 9:06 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> Right, and that's one of the biggest problems page based IO has - we
> can't serialise it against other IO and other page cache
> manipulation functions like hole punching. What happens when a
> splice read or mmap page fault races with a hole punch? You get
> stale data being left in the page cache because we can't serialise
> the page read with the page cache invalidation and underlying extent
> removal.

But Dave, that's *good*.

You call it "stale data".

I call it "the data was valid at some point".

This is what "splice()" is fundamentally all about.

Think of it this way: even if you are 100% serialized during the
"splice()" operation, what do you think happens afterwards?

Seriously, think it through.

That data is in a kernel buffer - the pipe. The fact that it was
serialized at the time of the original splice() doesn't make _one_
whit of a difference, because after the splice is over, the data still
sits around in that pipe buffer, and you're no longer serializing it.
Somebody else truncating the file or punching a hole in the file DOES
NOT MATTER. It's too late.

In other words, trying to "protect" against that kind of race is stupid.

You're missing the big picture because you're concentrating on the
details. Look beyond what happens inside XFS, and think about the
higher-level meaning of splice() itself.

So the only guarantee splice *should* give is entirely per-page. If
you think it gives any other serialization, you're fundamentally
wrong, because it *cannot*. See?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Thierry Reding: "Re: [PATCH V3 3/4] ARM: dts: tegra: Correct PCIe entry"
Previous message: James Bottomley: "Re: [Ksummit-2013-discuss] [ATTEND] scsi-mq prototype discussion"
In reply to: Dave Chinner: "Re: splice vs execve lockdep trace."
Next in thread: Dave Chinner: "Re: splice vs execve lockdep trace."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]