[PATCH] tcp: do not promote SPLICE_F_NONBLOCK to socket O_NONBLOCK

From: Octavian Purdila
Date: Thu Jul 17 2008 - 09:36:19 EST

commit 11134aa8499b6fd67569e8fd21bde6fc481898d1
Author: Octavian Purdila <opurdila@xxxxxxxxxxx>
Date: Thu Jul 17 16:25:23 2008 +0300

tcp: do not promote SPLICE_F_NONBLOCK to socket O_NONBLOCK

This patch changes tcp_splice_read to the behavior implied by man 2

SPLICE_F_NONBLOCK - Do not block on I/O. This makes the splice
pipe operations non-blocking, but splice() may nevertheless block
because the file descriptors that are spliced to/from may block
(unless they have the O_NONBLOCK flag set).

This approach also provides a simple solution to the splice
transfer size problem. Say we have the following common sequence:

splice(socket, pipe);
splice(pipe, file);

Unless we specify SPLICE_F_NONBLOCK, we can't use arbitrarily large
transfer sizes with the 1st splice since otherwise we will deadlock
due to pipe being full. But if we use SPLICE_F_NONBLOCK, the current
implementation will make the underlying socket non-blocking and thus
will force us use poll or other async I/O notification mechanism.

Choosing a splice transfer size that won't deadlock is not trivial: we
need to stay under PIPE_BUFFERS packets and since packets can have
arbitrary sizes we will need to be conservative and use a small
transfer size. That can degrade performance due to excessive system

Signed-off-by: Octavian Purdila <opurdila@xxxxxxxxxxx>

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 56a133c..cc5082b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -570,7 +570,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,


- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
@@ -578,10 +578,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
else if (!ret) {
if (spliced)
- if (flags & SPLICE_F_NONBLOCK) {
- ret = -EAGAIN;
- break;
- }
if (sock_flag(sk, SOCK_DONE))
if (sk->sk_err) {