Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)

From: Michael Kerrisk (man-pages)
Date: Sat Jan 10 2015 - 02:14:22 EST

Next message: Mike Galbraith: "Re: sched_yield() call on Linux Kernel 2.6.39 is not behaving correct"
Previous message: Darren Hart: "Re: [PATCH] Documentation: Add entry for dell-laptop sysfs interface"
In reply to: Rich Felker: "Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)"
Next in thread: Eric W. Biederman: "Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/09/2015 11:13 PM, Eric W. Biederman wrote:
> Rich Felker <dalias@xxxxxxxxxx> writes:
>
>> On Fri, Jan 09, 2015 at 09:09:41PM +0000, Al Viro wrote:
>
>> The "magic open-once magic symlink" approach is really the cleanest
>> solution I can find. In the case where the interpreter does not open
>> the script, nothing terribly bad happens; the magic symlink just
>> sticks around until _exit or exec. In the case where the interpreter
>> opens it more than once, you get a failure, but as far as I know
>> existing interpreters don't do this, and it's arguably bad design. In
>> any case it's a caught error.
>
> And it doesn't work without introducing security vulnerabilities into
> the kernel, because it breaks close-on-exec semantics.
>
> All you have to do is pick a file descriptor, good canidates are 0 and
> 255 and make it a convention that that file descriptor is used for
> fexecve. At least when you want to support scripts. Otherwise you can
> set close-on-exec.
>
> That results in no accumulation of file descriptors because everyone
> always uses the same file descriptor.
>
> Regardless you don't have a patch and you aren't proposing code and the
> code isn't actually broken so please go away.

Eric,

This style of response isn't helpful. Suggesting that people must have
a patch in hand in order to have a conversation about kernel development
means a lot of clever people are going to be excluded from important
conversations. Those clever people are some user-space developers
who develop the software that the kernel interacts with--you know, the
user-space that is the kernel's raison-d'être.

Rich, as far as I've seen, is one of those clever people--he implemented
and maintains a (pretty much complete?) standard C library, so when he
comes to a conversation like this, I think it's best to start with
the assumption that he's thought long and hard about the problem, and
seemingly hostile responses as you (and Al) make above don't do much
to advance the conversation to a solution.

And there is a problem [*] and nothing I've seen so far in this
conversation seems to provide a solution within the current
kernel implementation (but, maybe I am not clever enough to see it).

==

[*] A summary of the problem for bystanders:

[0.a] Some people want a solution to implementing fexecve()
(http://man7.org/linux/man-pages/man3/fexecve.3.html )
in the absence of /proc (which is currently used for
the implementation). The new execveat() is a stepping
stone to that solution.

[0.b] POSIX permits, but does not require, the FD_CLOEXEC
(close-on-exec) file descriptor flag to be set on the
file descriptor passed to fexecve().

[1] The sequence:
* Open a script file, to get a descriptor, 'fd'
* Set the close-on-exec flag on 'fd'
* execveat(fd, NULL, argv, envp, AT_EMPTY_PATH)

fails in the execveat() because by the time the script
interpreter has been loaded, 'fd' has been closed because
of the close-on-exec flag.

[2] Omitting the use of close-on-exec on the FD given to
fexecve()/execveat() means that the execed script
receives a superfluous file descriptor that refers to the
script file. The script cannot determine that there is such
an FD or which FD it is without some some messy special-case
hacking to inspect its environment (and that hacking must be
based on /proc, AFAICT!)

[3] Scripts won't do the check in [2], with the result that
that there'll be descriptor leaks in some cases where
fexecve()/execveat() is used repeatedly.

[4] (As Rich points out in a reply to the parent message, the
solution suggested above of using a fixed file descriptor
for fexecve() does not solve the problem either.)

For an example of the leak, consider the following simple program
and script. The program is just a simple command-line interface to
exercise execveat():

=====
/* t_execveat.c
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

#define __NR_execveat 322 /* x86-64 */

static int execveat(int dirfd, const char *pathname, char *const argv[],
char *const envp[], int flags)
{
return syscall(__NR_execveat, dirfd, pathname, argv, envp, flags);
}

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

extern char **environ;

int
main(int argc, char *argv[])
{
int flags, dirfd;
char *path;

flags = 0;

if (argc < 4) {
fprintf(stderr, "%s dirfd-path path argv0 [argvN...]\n", argv[0]);
fprintf(stderr, "\tSpecify 'dirfd' as '-' to get AT_FDCWD\n");
fprintf(stderr, "\tSpecify 'path' as an empty string to get "
"AT_EMPTY_PATH\n");
exit(EXIT_FAILURE);
}

if (argv[1][0] == '-')
dirfd = AT_FDCWD;
else {
dirfd = open(argv[1], O_RDONLY);
if (dirfd == -1)
errExit("open");
}

path = argv[2];
if (strlen(path) == 0)
flags = AT_EMPTY_PATH;

execveat(dirfd, path, &argv[3], environ, flags);
errExit("execveat");

exit(EXIT_SUCCESS);
}
=====

And then a simple script (necho.sh) that recursively invokes itself using
the above program demonstrates the problem.

=====
#!/bin/sh
echo
echo '$0 =' $0
ls -l /proc/$$/fd
./t_execveat ./necho.sh "" arg1 # $arg
=====

When we run this script, we see:

=====

# chmod +x necho.sh
# ./t_execveat ./necho.sh "" arg1

$0 = /dev/fd/3
total 0
lrwx------. 1 root root 64 Jan 10 07:59 0 -> /dev/pts/0
lrwx------. 1 root root 64 Jan 10 07:59 1 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 199 -> /home/mtk/necho.sh
lrwx------. 1 root root 64 Jan 10 07:59 2 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 3 -> /home/mtk/necho.sh

$0 = /dev/fd/4
total 0
lrwx------. 1 root root 64 Jan 10 07:59 0 -> /dev/pts/0
lrwx------. 1 root root 64 Jan 10 07:59 1 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 199 -> /home/mtk/necho.sh
lrwx------. 1 root root 64 Jan 10 07:59 2 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 3 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 4 -> /home/mtk/necho.sh

$0 = /dev/fd/5
total 0
lrwx------. 1 root root 64 Jan 10 07:59 0 -> /dev/pts/0
lrwx------. 1 root root 64 Jan 10 07:59 1 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 199 -> /home/mtk/necho.sh
lrwx------. 1 root root 64 Jan 10 07:59 2 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 3 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 4 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 5 -> /home/mtk/necho.sh

$0 = /dev/fd/6
total 0
lrwx------. 1 root root 64 Jan 10 07:59 0 -> /dev/pts/0
lrwx------. 1 root root 64 Jan 10 07:59 1 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 199 -> /home/mtk/necho.sh
lrwx------. 1 root root 64 Jan 10 07:59 2 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 3 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 4 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 5 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 6 -> /home/mtk/necho.sh

$0 = /dev/fd/7
total 0
lrwx------. 1 root root 64 Jan 10 07:59 0 -> /dev/pts/0
lrwx------. 1 root root 64 Jan 10 07:59 1 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 199 -> /home/mtk/necho.sh
lrwx------. 1 root root 64 Jan 10 07:59 2 -> /dev/pts/0
lr-x------. 1 root root 64 Jan 10 07:59 3 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 4 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 5 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 6 -> /home/mtk/necho.sh
lr-x------. 1 root root 64 Jan 10 07:59 7 -> /home/mtk/necho.sh

[and so on until we run out of file descriptors]
=====

(I think the FD 199 in the above output is some bash(1) artifact, unrelated
to the conversation at hand.)

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mike Galbraith: "Re: sched_yield() call on Linux Kernel 2.6.39 is not behaving correct"
Previous message: Darren Hart: "Re: [PATCH] Documentation: Add entry for dell-laptop sysfs interface"
In reply to: Rich Felker: "Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)"
Next in thread: Eric W. Biederman: "Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]