Re: thread group comments

From: Theodore Y. Ts'o (tytso@MIT.EDU)
Date: Fri Sep 01 2000 - 19:48:33 EST


   From: Ulrich Drepper <drepper@redhat.com>
   Date: 01 Sep 2000 14:52:28 -0700

   1st Problem: One signal handler process process-wide

   What is handled correctly now is sending signals to the group. Also
   that every thread has its mask. But there must be exactly one signal
   handler installed. I.e., a sigaction() call to set a new handler has
   consequences process-wide. Since this muse be atomic I think the
   information should be kept in the thread group leader's data
   structures and the other threads should simply use this information
   all the time. Yeah, I know, one more indirection.

Unfortunately, I think you're right. This will mean an extra
indirection and locking, but signals aren't performance critical.

   2nd Problem: Fatal signal handling

   kernel/signal.c contains:

    * Send a thread-group-wide signal.
    *
    * Rule: SIGSTOP and SIGKILL get delivered to _everybody_.

   That's OK. Except that is a signal whose default action is to
   terminate the process is not caught be the application, this signal is
   also handled process-wide. E.g., if there is no SIGSEGV handler the
   whole process is terminated.

True, but this can be handled by having the master thread process catch
SIGSEGV and redistributing the signal to all of its child-threads.

(The assumption I'm making here is that the master thread doesn't do
anything except spawn all threads for the process and monitors its child
processes for death. This is the n+1 model.)

   This will have to go hand in hand with an extension of the core file
   format to include information about all threads but for the time being
   it's enough if only the offending thread is dumped and the rest simply
   killed.

True, although my bias has always been that if you wanted debuggable
programs, you wouldn't have been using threads in the first place. :-)
(That was a joke, for the humor-impaired!)

   3rd Problem: one uid/gid process-wide

   All the ID (uid/guid/euid/egid/...) must be process wide. The problem
   is similar to the signal handler. I think one should again keep the
   information exclusively in the master thread and have all others refer
   to this information.

This will probably have to be emulated in user-mode via IPC's. UID
changes don't happen often, and they're **NOT** performance critical.
The aternative is to put locking around all of the uid/gid checks in the
mainline kernel code, and that's been considered unreasonable.

   4th Problem: thread termination

   In general, thread termination is not of much interest for the rest of
   the system. It is in the moment but if the fatal signal handling is
   done this will change.

   If a thread gets a fatal signal, the whole process is killed. No
   cleanup necessary. Signal handlers can be installed if necessary.

   If a thread terminates naturally and can perform the cleanup itself.

   In any case, the death signal should be ignored. Except for the last
   thread, of course, which has to notify the process starting the MT
   application.

I assume here that when you say "death signal" you mean SIGCHLD? If so,
if all threads are children of the master thread, then the master thread
will get all of the SIGCHLD's, and it can deal with them appropriately.
(In fact, the master thread can get the notification that one of the
threads died due to an unhandled SEGV, and it can use that information
to kill off all over the other threads.)

   5th Problem: suspended starting

   Related to the last problem a good old friend pops up. Depending on
   the solution of the last problem it might be necessary to add
   suspended starting of threads. The problem is that sometimes the
   starter has to modify parameters (e.g., scheduler) of the newly
   started thread before it can actually start working. If this fails,
   the new thread must be terminated immediately. But who will get the
   termination signal? The data structures for the new thread must be
   removed as well and this after the new thread is guaranteed to be
   vanished.

Umm.... why can't this be done in user space? We can do it in kernel
space, but it means adding a new flag to clone() which tells it to
suspend the child immediately and let the parent return. Why can't it
be done in userpsace by having the library which calls clone() as part
of starting a new thread suspend itself if it's the child? I must be
missing something.....

                                                - Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:12 EST