Re: Killing clones

Richard Gooch (rgooch@atnf.CSIRO.AU)
Thu, 14 Aug 1997 14:41:26 +1000


Linus Torvalds writes:
>
> On Thu, 14 Aug 1997, Richard Gooch wrote:
> > >
> > > About a year ago a least was a a discussion about fixing up the
> > > CLONE_PID flag so that thread id's would be encoded in the upper bits of
> > > the pid. One could have ps list only the proc info for the initial
> > > thread (or other thread if the initial has exited), and be able to see
> > > what initial process all the threads are associated with.
> >
> > It's a pity nothing ever went into the kernel. Threads under Linux is
> > still a bit clunky. Bit like IRIX threads, actually... At least we
> > don't have to contend with arenas!
>
> Indeed.

Yep, it's so much fun having to modify the code to increase the arena
size when you 1) get more CPUs, 2) launch more threads.

> My hope for the development of clone() was originally:
> - I laid the basic framework (done)
> - people started using it, primarily for
> - pthreads (pretty much done)
> - asynchronous IO (nope)

In fact, the application that really started me thinking in earnest
about clone() problems is sort-of asynchronous I/O: deep in my support
library I have "Channel" object, a bit like C library "FILE *"
streams, but properly full duplex (i.e. a separate read and write
buffer) plus a lot more smarts. Set an attribute and suddenly any
blocking writes as well as read-ahead can be done by a child thread,
leaving the main thread to write without blocking but also without
having to worry about rescheduling writes if you get EAGAIN.

> - interesting new uses that nobody thought about before (nope)
> - people started looking at what was missing and add kernel support
>
> The basic problem I had was that I had lots of ideas on what I wanted to
> be possible, but I was also aware that while I wanted a very specific
> basic design, I was by no means sure exactly which details need to be
> handled where.
>
> For example, the exact bits in the clone() flags field were made up by me
> not because I wanted those exact bits, but because I imagined that those
> bits might possibly make sense. For example, I am certain that the
> CLONE_VM bit makes sense, but does CLONE_FS make sense? I don't know.
> Maybe it would be better to include the current CLONE_FS information into
> the CLONE_FILES stuff?

Does it matter? It's there now, you don't have to use it, and it only
costs you one bit. Who knows: someone might figure out a reason to use
one and not the other. Actually, now that I think about it, I realise
that I'm now doing just that: I use CLONE_FS but not CLONE_FILES
because of that trick I mentioned earlier with pipes so that threads
can know when the parent dies. I still want to CLONE_FS, since
changing directory should still be global. There you go.

> Or take a look at the low 8 bits. I decided that to implement aio_xxx() on
> top of clone() we might be better off not using SIGCHLD, but instead have
> the death of the clone'd child send SIGIO directly. So now the low eight
> bits of clone_flags is the signal to be sent at exit time.

I like that feature: I use SIGKILL so that if a child dies, it takes
down the parent (there are reasons I don't want to deal with
SIGCHLD). With my "death pipe" scheme, when the parent is killed the
remaining child threads kill themselves off.

> I still don't know if people actually use this feature - or the feature
> that when the signal is something else than SIGCHLD, you have to use a
> special flag to "waitpid()" to get it to recognize the cloned children.
> Again, this was so that "waitpid()" wouldn't end up waiting for
> asynchronous IO.
>
> In short, I tried to make a basic framework that would fit my ideas of
> what a light-weight clone() would be good for, but at the same time I
> didn't want to set the design in stone - the intention was for it to
> develop as people found new ways of (mis)using the new cool thread
> feature.

I don't think you've made the interface over-flexible. I think all
those features are being used somehow.

> > > * modifications to the linuxthreads package for testing all this out
> > > * other things that may be needed -- linuxthreads needs a little bit of
> > > help in the kernel in the few places and presumably other more general
> > > clone based threads need the same sort of stuff. For example,
> > > linuxthreads uses SIGUSR1, SIGUSR2, and it also needs a "manager thread"
> > > to handle things like stack cleanup at termination.
> >
> > None of these address the problem of killing child threads. Although
> > it's all good stuff you mention: nice to see someone thinking about
> > all this.
> > What was Linus' view on encoding IDs in the upper pid bits? It's
> > certainly good for grouping processes together, though it may be more
> > prone to introducing bugs than a simple CLONE_NO_PROC_ENTRY like I
> > suggested, by the simple rule that anything more complicated is likely
> > to have more bugs :-)
>
> Encoding the thread ID in the high bits was one of the ideas from the very
> beginning. That's what CLONE_PID is there for: the _intent_ was that
> CLONE_PID would change only the high bits, and then you could do a global
> kill (anything with the high bits zero would send a signal to _all_
> threads that shared the same low bits).
>
> Again, this is still an interesting approach. I'd like to see it done some
> day. The fact that the /proc fs makes it a bit harder is a misfeature, but
> that's actually due to bad /proc design (which used to make a lot of sense
> back when inode numbers was all we had, but we could do better these
> days).
>
> My personal favourite for /proc would be that any CLONE_PID threads would
> show up _inside_ the original parent (that's kind of the basic idea with
> CLONE_PID). So you'd have
>
> /proc/155/ "original" process (ie something that was
> created without the CLONE_PID bit)
> /proc/155/1 "1st CLONE_PID child"
> /proc/155/2 "2nd CLONE_PID child"
>
> or something like that.

YES, YES, YES. The way it should be. Better than the Solaris scheme,
where you need a special syscall to dig into the process and get the
LWPs, but without cluttering ps output.

> > Can I suggest that, where possible, improvements to thread support in
> > Linux is made as a set of separate patches? We've seen before that
> > Linus rejects omnibus patches if he doesn't like some bits, even if
> > other bits are OK. Giving them as separate (independent) patches makes
> > life easier for him (and hence more likely that "good" patches are
> > applied quickly).
>
> Indeed.

Thought that approach might help :-)

> > * new flag to clone() to allow pids to be "grouped" so that part of
> > the bitrange for pids are shared within a group. Also needs pid
> > allocation algorithm to change
>
> CLONE_PID is that. It currently has a very limited use: the kernel uses it
> to allocate the SMP idle processes for each task, and those all have to
> have pid 0 (also high bits). But my real intent was to have something like
> this:
>
> if (flags & CLONE_PID) {
> newpid = current->pid;
> /* zero is special - the idle process */
> if (newpid) {
> create linked list of processes
> sharing the same low 16 bits,
> make "newpid" be the largest to
> date plus 0x10000 (ie "increment" the
> high 16 bit counter on a per-PID basis)
> }
> } else
> newpid = traditional_newpid();
>
> > * new flag to clone() to either hide a process in /proc or reflect
> > that the flag was set (thinking about this more makes me think that
> > this scheme can give much the same flexibility as the above with less
> > work in the kernel and possibly a little more work in userspace procps
> > tools)
>
> See above about how I'd like this to work. With the /proc/155/1 setup, the
> old tools would only ever see the original parent, so to "ps" the threaded
> application would look like just one process.
>
> Additionally note that "kill -1 155" would send SIGHUP to _all_ the
> threads, and if you wanted to kill just one subthread you'd have to name
> it completely in 32 bits (ie "kill -1 $((0x10000+155))" would kill 155/1,
> and we'd probably make an extension to bash so that you can say just that:
> "kill -1 155/1" would do the math for you).

If we can get such an extension, that would do, IMHO. Might be messy
otherwise.

> Yes, this does imply that the first thread is special, but I don't see
> anything really wrong with that. If you don't want the first thread to be
> special, just don't use CLONE_PID - then all threads will have a full life
> of their own.

I think the first thread being special is fine. Same as in Solaris.

> > * new signals for LinuxThreads support (no more stealing of SIGUSR1
> > and SIGUSR2)
>
> Yes. This is separate from threads, though. We need this for RT signals
> anyway.
>
> > * new flag to clone() or new syscall prctl() so that when a processes'
> > parent dies, it is sent a signal
>
> Agreed. It's actually technically very easy to send a SIGPARENT (just do
> it in "forget_original_parent()", I think), and it needs another bit in
> the clone flags.

But then you need a new signal? Won't that take some revamping of the
signal handling and libc interface? Not that I have a problem with the
concept of SIGPARENT. I think adding a flag to clone() or a new
syscall to specify a signal to be sent when the parent dies has more
flexibility: there may be times when you want a different signal than
SIGPARENT. I imagine that the default would be that you get SIGPARENT
(which is ignored by default), but you can use clone() or prctl() to
change the signal. In my apps, I would use SIGKILL.
Doing it the way I suggest (i.e do both) has the advantage of getting
a quick patch for delivering a signal to a child. Later the more
difficult (?) job of extending the signal list can be done.
Is that reasonable?

Regards,

Richard....