[PATCH] make execve(NULL) re-execute current binary

From: Denys Vlasenko
Date: Mon Jun 29 2009 - 18:00:11 EST


Hi Al, Andrew, folks,

This is a version 2 of re-execution patch.

I replaced hardcoded "/proc/self/exe" with execve(NULL)
extension in the hopes that this is considered less ugly.
Also I tried to format code according to Andrew's wishes.

Handling execve(NULL) requires adding a bit of code
to per-architecture sys_execve().
In the attached patch, it is done only on x86.
If this patch will be ACKed in principle,
the final version will do it for all architectures.

Description follows.

=========================================================

In some circumstances running process needs to re-execute
its image.

Among other useful cases, it is _crucial_ for NOMMU arches.

They need it to perform daemonization. Classic sequence
of "fork, parent dies, child continues" can't be used
due to lack of fork on NOMMU, and instead we have to do
"vfork, child re-exec itself (with a flag to not daemonize)
and therefore unblocks parent, parent dies".

Another crucial use case on NOMMU is POSIX shell support.
Imagine a shell command of the form "func1 | func2 | func3".
This can be implemented on NOMMU by vforking thrice,
re-executing the shell in every child in the form
"<shell> -c 'body of funcN'", and letting parent wait and collect
exitcodes and such. As far as I can see, it's the only way
to implement it correctly on NOMMU.

The program may re-execute itself by name if it knows the name,
but we generally may be unsure about it. Binary may be renamed,
or even deleted while it is being run.

More elegant way is to execute /proc/self/exe.
This works just fine as long as /proc is mounted.

But it breaks if /proc isn't mounted, and this can happen in real-world
usage. For example, when shell invoked very early in initrd/initramfs.
Or when the program is in a chroot jail. Etc.

With this patch, it is possible to re-execute current binary
even if /proc is not mounted. It is done with execve()
call with NULL pointer as a 1st parameter instead of filename to exec.

Please comment.

Signed-off-by: Denys Vlasenko <vda.linux@xxxxxxxxxxxxxx>
--
vda
diff -urp ../linux-2.6.30.org/arch/x86/kernel/process_32.c linux-2.6.30-1/arch/x86/kernel/process_32.c
--- ../linux-2.6.30.org/arch/x86/kernel/process_32.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30-1/arch/x86/kernel/process_32.c 2009-06-29 22:28:38.000000000 +0200
@@ -453,6 +453,15 @@ int sys_execve(struct pt_regs *regs)
int error;
char *filename;

+ if (regs->bx == 0) {
+ /* execme */
+ error = do_execve(NULL,
+ (char __user * __user *) regs->cx,
+ (char __user * __user *) regs->dx,
+ regs);
+ goto out;
+ }
+
filename = getname((char __user *) regs->bx);
error = PTR_ERR(filename);
if (IS_ERR(filename))
@@ -461,12 +470,12 @@ int sys_execve(struct pt_regs *regs)
(char __user * __user *) regs->cx,
(char __user * __user *) regs->dx,
regs);
+ putname(filename);
+out:
if (error == 0) {
/* Make sure we don't return using sysenter.. */
set_thread_flag(TIF_IRET);
}
- putname(filename);
-out:
return error;
}

diff -urp ../linux-2.6.30.org/arch/x86/kernel/process_64.c linux-2.6.30-1/arch/x86/kernel/process_64.c
--- ../linux-2.6.30.org/arch/x86/kernel/process_64.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30-1/arch/x86/kernel/process_64.c 2009-06-29 22:28:56.000000000 +0200
@@ -504,6 +504,9 @@ long sys_execve(char __user *name, char
long error;
char *filename;

+ if (name == NULL)
+ return do_execve(NULL, argv, envp, regs);
+
filename = getname(name);
error = PTR_ERR(filename);
if (IS_ERR(filename))
diff -urp ../linux-2.6.30.org/fs/exec.c linux-2.6.30-1/fs/exec.c
--- ../linux-2.6.30.org/fs/exec.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6.30-1/fs/exec.c 2009-06-29 22:29:44.000000000 +0200
@@ -644,14 +644,39 @@ EXPORT_SYMBOL(setup_arg_pages);

#endif /* CONFIG_MMU */

+static struct file *open_self(void)
+{
+ struct file *file;
+ struct mm_struct *mm;
+
+ mm = get_task_mm(current);
+ file = NULL;
+ if (mm) {
+ file = get_mm_exe_file(mm);
+ mmput(mm);
+ }
+ if (!file)
+ file = ERR_PTR(-ENOENT);
+ return file;
+}
+
struct file *open_exec(const char *name)
{
struct file *file;
int err;

- file = do_filp_open(AT_FDCWD, name,
+ if (name == NULL) {
+ /*
+ * execve(NULL) execs the binary of the current process.
+ * Unlike execve("/proc/self/exe"), it does not require
+ * mounted /proc.
+ */
+ file = open_self();
+ } else {
+ file = do_filp_open(AT_FDCWD, name,
O_LARGEFILE | O_RDONLY | FMODE_EXEC, 0,
MAY_EXEC | MAY_OPEN);
+ }
if (IS_ERR(file))
goto out;

@@ -1291,8 +1316,8 @@ int do_execve(char * filename,
sched_exec();

bprm->file = file;
- bprm->filename = filename;
- bprm->interp = filename;
+ bprm->filename = filename ? filename : current->comm;
+ bprm->interp = bprm->filename;

retval = bprm_mm_init(bprm);
if (retval)