Re: Killing clones

Linus Torvalds (torvalds@transmeta.com)
Wed, 13 Aug 1997 21:01:13 -0700 (PDT)


On Thu, 14 Aug 1997, Richard Gooch wrote:
> >
> > About a year ago a least was a a discussion about fixing up the
> > CLONE_PID flag so that thread id's would be encoded in the upper bits of
> > the pid. One could have ps list only the proc info for the initial
> > thread (or other thread if the initial has exited), and be able to see
> > what initial process all the threads are associated with.
>
> It's a pity nothing ever went into the kernel. Threads under Linux is
> still a bit clunky. Bit like IRIX threads, actually... At least we
> don't have to contend with arenas!

Indeed.

My hope for the development of clone() was originally:
- I laid the basic framework (done)
- people started using it, primarily for
- pthreads (pretty much done)
- asynchronous IO (nope)
- interesting new uses that nobody thought about before (nope)
- people started looking at what was missing and add kernel support

The basic problem I had was that I had lots of ideas on what I wanted to
be possible, but I was also aware that while I wanted a very specific
basic design, I was by no means sure exactly which details need to be
handled where.

For example, the exact bits in the clone() flags field were made up by me
not because I wanted those exact bits, but because I imagined that those
bits might possibly make sense. For example, I am certain that the
CLONE_VM bit makes sense, but does CLONE_FS make sense? I don't know.
Maybe it would be better to include the current CLONE_FS information into
the CLONE_FILES stuff?

Or take a look at the low 8 bits. I decided that to implement aio_xxx() on
top of clone() we might be better off not using SIGCHLD, but instead have
the death of the clone'd child send SIGIO directly. So now the low eight
bits of clone_flags is the signal to be sent at exit time.

I still don't know if people actually use this feature - or the feature
that when the signal is something else than SIGCHLD, you have to use a
special flag to "waitpid()" to get it to recognize the cloned children.
Again, this was so that "waitpid()" wouldn't end up waiting for
asynchronous IO.

In short, I tried to make a basic framework that would fit my ideas of
what a light-weight clone() would be good for, but at the same time I
didn't want to set the design in stone - the intention was for it to
develop as people found new ways of (mis)using the new cool thread
feature.

> > * modifications to the linuxthreads package for testing all this out
> > * other things that may be needed -- linuxthreads needs a little bit of
> > help in the kernel in the few places and presumably other more general
> > clone based threads need the same sort of stuff. For example,
> > linuxthreads uses SIGUSR1, SIGUSR2, and it also needs a "manager thread"
> > to handle things like stack cleanup at termination.
>
> None of these address the problem of killing child threads. Although
> it's all good stuff you mention: nice to see someone thinking about
> all this.
> What was Linus' view on encoding IDs in the upper pid bits? It's
> certainly good for grouping processes together, though it may be more
> prone to introducing bugs than a simple CLONE_NO_PROC_ENTRY like I
> suggested, by the simple rule that anything more complicated is likely
> to have more bugs :-)

Encoding the thread ID in the high bits was one of the ideas from the very
beginning. That's what CLONE_PID is there for: the _intent_ was that
CLONE_PID would change only the high bits, and then you could do a global
kill (anything with the high bits zero would send a signal to _all_
threads that shared the same low bits).

Again, this is still an interesting approach. I'd like to see it done some
day. The fact that the /proc fs makes it a bit harder is a misfeature, but
that's actually due to bad /proc design (which used to make a lot of sense
back when inode numbers was all we had, but we could do better these
days).

My personal favourite for /proc would be that any CLONE_PID threads would
show up _inside_ the original parent (that's kind of the basic idea with
CLONE_PID). So you'd have

/proc/155/ "original" process (ie something that was
created without the CLONE_PID bit)
/proc/155/1 "1st CLONE_PID child"
/proc/155/2 "2nd CLONE_PID child"

or something like that.

> Can I suggest that, where possible, improvements to thread support in
> Linux is made as a set of separate patches? We've seen before that
> Linus rejects omnibus patches if he doesn't like some bits, even if
> other bits are OK. Giving them as separate (independent) patches makes
> life easier for him (and hence more likely that "good" patches are
> applied quickly).

Indeed.

> * new flag to clone() to allow pids to be "grouped" so that part of
> the bitrange for pids are shared within a group. Also needs pid
> allocation algorithm to change

CLONE_PID is that. It currently has a very limited use: the kernel uses it
to allocate the SMP idle processes for each task, and those all have to
have pid 0 (also high bits). But my real intent was to have something like
this:

if (flags & CLONE_PID) {
newpid = current->pid;
/* zero is special - the idle process */
if (newpid) {
create linked list of processes
sharing the same low 16 bits,
make "newpid" be the largest to
date plus 0x10000 (ie "increment" the
high 16 bit counter on a per-PID basis)
}
} else
newpid = traditional_newpid();

> * new flag to clone() to either hide a process in /proc or reflect
> that the flag was set (thinking about this more makes me think that
> this scheme can give much the same flexibility as the above with less
> work in the kernel and possibly a little more work in userspace procps
> tools)

See above about how I'd like this to work. With the /proc/155/1 setup, the
old tools would only ever see the original parent, so to "ps" the threaded
application would look like just one process.

Additionally note that "kill -1 155" would send SIGHUP to _all_ the
threads, and if you wanted to kill just one subthread you'd have to name
it completely in 32 bits (ie "kill -1 $((0x10000+155))" would kill 155/1,
and we'd probably make an extension to bash so that you can say just that:
"kill -1 155/1" would do the math for you).

Yes, this does imply that the first thread is special, but I don't see
anything really wrong with that. If you don't want the first thread to be
special, just don't use CLONE_PID - then all threads will have a full life
of their own.

> * new signals for LinuxThreads support (no more stealing of SIGUSR1
> and SIGUSR2)

Yes. This is separate from threads, though. We need this for RT signals
anyway.

> * new flag to clone() or new syscall prctl() so that when a processes'
> parent dies, it is sent a signal

Agreed. It's actually technically very easy to send a SIGPARENT (just do
it in "forget_original_parent()", I think), and it needs another bit in
the clone flags.

Linus