[PATCH 1/2] pipe: check limits only when increasing pipe capacity

From: Michael Kerrisk (man-pages)
Date: Tue Aug 16 2016 - 07:10:57 EST


When changing a pipe's capacity with fcntl(F_SETPIPE_SZ), various
limits defined by /proc/sys/fs/pipe-* files are checked to see
if unprivileged users are exceeding limits on memory consumption.

While documenting and testing the operation of these limits I noticed
that, as currently implemented, these checks can lead to cases where
a user can increase a pipe's capacity and is then unable to decrease
the capacity. The origin of the problem is two-fold:

(1) When increasing the pipe capacity, the checks against the limits
in /proc/sys/fs/pipe-user-pages-{soft,hard} are made against
existing consumption, and exclude the memory required for the
increased pipe capacity. The new increase in pipe capacity
can then push the total memory used by the user for pipes
(possibly far) over a limit.

(2) The limit checks are performed even when the new pipe capacity
is less than the existing pipe capacity. This can lead to
problems if a user sets a large pipe capacity, and then the
limits are lowered, with the result that the user will no
longer be able to decrease the pipe capacity.

The simple solution given by this patch is to perform the checks
only when the pipe capacity is being increased. The patch does not
address the broken check in (1), which allows a user to (one-time)
set a pipe capacity that pushes the user's consumption over the user
pipe limits. A change to fix that check is proposed in a subsequent
patch. I've separated the two fixes because the second fix is a
little more complex, and could possibly (though unlikely) break
existing user-space. The current patch implements the simple fix
that carries little risk and seems obviously correct: allowing an
unprivileged user always to decrease a pipe's capacity.

The program below can be used to demonstrate the problem, and the
effect of the fix. The program takes one or more command-line
arguments. The first argument specifies the number of pipes
that the program should create. The remaining arguments are,
alternately, pipe capacities that should be set using
fcntl(F_SETPIPE_SZ), and sleep intervals (in seconds) between
the fcntl() operations. (The sleep intervals allow the possibility
to change the limits between fcntl() operations.)

Running this program on an unpatched kernel, we first set some limits:

# getconf PAGESIZE
4096
# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard # 40.96 MB

Now perform two fcntl(F_SETPIPE_SZ) operations on a single pipe,
first setting a pipe capacity (10MB), sleeping for a few seconds,
during which time the hard limit is lowered, and then set pipe
capacity to a smaller amount (5MB):

# sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
[1] 748
# Loop 1: set pipe capacity to 10000000 bytes
F_SETPIPE_SZ returned 16777216
Sleeping 15 seconds

# echo 1000 > /proc/sys/fs/pipe-user-pages-hard # 4.096 MB

# Loop 2: set pipe capacity to 5000000 bytes
Loop 2, pipe 0: F_SETPIPE_SZ failed: fcntl: Operation not permitted

In this case, the user should be able to lower the limit.

With a kernel that has the patch below, the second fcntl()
succeeds:

# echo 0 > /proc/sys/fs/pipe-user-pages-soft
# echo 1000000000 > /proc/sys/fs/pipe-max-size
# echo 10000 > /proc/sys/fs/pipe-user-pages-hard
# sudo -u mtk ./test_F_SETPIPE_SZ 1 10000000 15 5000000 &
[1] 3215
# Loop 1: set pipe capacity to 10000000 bytes
F_SETPIPE_SZ returned 16777216
Sleeping 15 seconds

# echo 1000 > /proc/sys/fs/pipe-user-pages-hard

# Loop 2: set pipe capacity to 5000000 bytes
F_SETPIPE_SZ returned 8388608

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

/* test_F_SETPIPE_SZ.c

(C) 2016, Michael Kerrisk; licensed under GNU GPL version 2 or later

Test operation of fcntl(F_SETPIPE_SZ) for setting pipe capacity
and interactions with limits defined by /proc/sys/fs/pipe-* files.
*/

int
main(int argc, char *argv[])
{
int (*pfd)[2];
int npipes;
int pcap, rcap;
int j, p, s, stime, loop;

if (argc < 2) {
fprintf(stderr, "Usage: %s num-pipes "
"[pipe-capacity sleep-time]...\n", argv[0]);
exit(EXIT_FAILURE);
}

npipes = atoi(argv[1]);

pfd = calloc(npipes, sizeof (int [2]));
if (pfd == NULL) {
perror("calloc");
exit(EXIT_FAILURE);
}

for (j = 0; j < npipes; j++) {
if (pipe(pfd[j]) == -1) {
fprintf(stderr, "Loop %d: pipe() failed: ", j);
perror("pipe");
exit(EXIT_FAILURE);
}
}

for (j = 2; j < argc; j += 2 ) {
loop = j / 2;
pcap = atoi(argv[j]);
printf(" Loop %d: set pipe capacity to %d bytes\n", loop, pcap);

for (p = 0; p < npipes; p++) {
s = fcntl(pfd[p][0], F_SETPIPE_SZ, pcap);
if (s == -1) {
fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
"failed: ", loop, p);
perror("fcntl");
exit(EXIT_FAILURE);
}

if (p == 0) {
printf(" F_SETPIPE_SZ returned %d\n", s);
rcap = s;
} else {
if (s != rcap) {
fprintf(stderr, " Loop %d, pipe %d: F_SETPIPE_SZ "
"unexpected return: %d\n", loop, p, s);
exit(EXIT_FAILURE);
}
}

stime = (j + 1 < argc) ? atoi(argv[j + 1]) : 0;
if (stime > 0) {
printf(" Sleeping %d seconds\n", stime);
sleep(stime);
}
}
}

exit(EXIT_SUCCESS);
}

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

Cc: Willy Tarreau <w@xxxxxx>
Cc: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
Cc: socketpair@xxxxxxxxx
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Cc: linux-api@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
---
fs/pipe.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 4ebe6b2..a98ebca 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1122,14 +1122,23 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
if (!nr_pages)
goto out;

- if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
- ret = -EPERM;
- goto out;
- } else if ((too_many_pipe_buffers_hard(pipe->user) ||
- too_many_pipe_buffers_soft(pipe->user)) &&
- !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
- ret = -EPERM;
- goto out;
+ /*
+ * If trying to increase the pipe capacity, check that an
+ * unprivileged user is not trying to exceed various limits.
+ * (Decreasing the pipe capacity is always permitted, even
+ * if the user is currently over a limit.)
+ */
+ if (nr_pages > pipe->buffers) {
+ if (!capable(CAP_SYS_RESOURCE) && size > pipe_max_size) {
+ ret = -EPERM;
+ goto out;
+ } else if ((too_many_pipe_buffers_hard(pipe->user) ||
+ too_many_pipe_buffers_soft(pipe->user)) &&
+ !capable(CAP_SYS_RESOURCE) &&
+ !capable(CAP_SYS_ADMIN)) {
+ ret = -EPERM;
+ goto out;
+ }
}
ret = pipe_set_size(pipe, nr_pages);
break;
--
2.5.5

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/