Re: Porting vfork()

Kenneth Albanowski (kjahds@kjahds.com)
Fri, 8 Jan 1999 15:29:38 -0500 (EST)


On Fri, 8 Jan 1999, Perry Harrington wrote:

> > Make sure that the parent hasn't gone away in the meanwhile.
>
> When it is an issue (before exec/exit) it's still considered a cloned thread and
> the parent can't go away without the child going away too.

How did you manage that? Unless you refer to CLONE_PID, I'm not aware of
any such behaviour.

> >
> > > That's a good question, do you want to temporarily block signals to the parent?
> >
> > I'm not sure, and I'm not sure what traditional behaviour is for this.
> > Letting the parent execute a signal handler is pretty clearly wrong, so I
> > guess signals have to be blocked for the duration. What about SIGKILL? I
> > guess leave it unblocked, otherwise we could have an unkillable parent.
> > That means the parent could go away while the child is still running, so
> > make sure the child won't die if this happens.
>
> If I read the code right, see above, the parent can't go away while the child is
> dependent on it because the mm structure is shared beteen them.

Yes, but the wait_queue is now in the parents task_struct, not the
mm_struct. And sharing the mm doesn't prevent either task going away, it
just means the mm won't get released until all tasks using it exit.

> > > The patch that I sent out (it's gonna hit the list sometime) bypasses the
> > > obvious problems of sharing VMs, by simply recognizing that in an MT app,
> > > your VM area WILL be modified, irregardless. For ST apps, you just have
> > > the parent go to sleep.
> >
> > Sorry, I didn't follow that. My assumption is that MT will work
> > identically to ST: the thread that invokes vfork() will go to sleep until
> > its child execs or exits. No other special behaviour is required. No
> > threads other then the one that vfork()'d will be affected.
>
> Correct, I'm just ignore the fact that the VM will change underneath the thread's
> child, it's immaterial because it's a threading issue, hence it's an app programmer
> issue.

Sorry, I still don't seem to be following you. Are you referring to the
fact that the vfork()'d child will, until it calls exec, be
multi-threading with the co-threads of the parent? Yes, that is an issue,
but it sounds like it can safely be left to the programmer. Everything can
safely wait until the parent resumes execution.

> > This behaviour seems to be required, due to the meaning of vfork(): the
> > parent and child share the stack, so they cannot both execute at the same
> > time (hence the wait_queue). Separate threads do not share the stack
> > (never mind that they share everything else), so each thread can use
> > vfork() independantly and safely.
>
> Since a vforked child is the same as a cloned thread, the stack should get cloned
> too, I think.

The point is that multiple threads aren't running on the _same_ user
stack, despite running in the same VM. The clone() syscall is needed to
set up each thread with an individual stack. vfork() leaves both the
parent and child running on the same stack, meaning A) the child must
complete execution before the parent can resume, B) the child cannot go
further up the stack then the parent, C) you'll probably need to code the
vfork() syscall a macro so that it does not involve a function call, as
this would automatically violate B.

If you want to avoid the stack problems altogether, you'll need to
allocate a new stack for the child. Then the user code needs to look
something like this:

child_stack = malloc(4096);
pid = clone(CLONE_VM|CLONE_VFORK, child_stack);
if (!pid) {
exec();
_exit();
} else {
free(child_stack);
}

The VFORK logic is still needed to keep the parent and child synchronized,
to prevent the parent from freeing the memory before the child is done
executing. I can't think of any trivial way of getting the memory properly
freed up without synchronization of some sort.

The true vfork() implementation avoids the cost of the memory allocation
(which could, admittedly, be handled by a single static allocation), and
looks something like this:

pid = asm(" call SYS_vfork ");
if (!pid) {
exec();
_exit();
}

(Or at least, the above is how I'm familiar with everything from uClinux.
I don't think any of this is incorrect for desktop Linux, but feel free to
correct me.)

> It's the same as clone, it just blocks the parent. It's this semantic which makes
> it better; when the parent wakes up, it can do the wait on the child and wake up in
> the same context, avoiding a superfluous context switch, because the data is immediately
> available.

OK, I'll take your word for it. :-)

-- 
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/