Re: Patch: CLONE_PPID, CLONE_WAIT, CLONE_SUSPENDED

Alon Ziv (alonz@cs.technion.ac.il)
Sat, 31 Jul 1999 12:28:59 +0300 (IDT)


On Tue, 27 Jul 1999, Eric PAIRE wrote:

[my attribution deleted :-/]
> > Actually, there _is_ one more unsolved problem: core dumps. When a
> > thread gets killed by the kernel for doing something illegal (SIGILL /
> > SIGSEGV / whatever), the complete process is supposed to be dumped. I
> > still haven't thought up a suitable solution for this...
> >
> Last year, after having developed the linuxthread debug support in GDB,
> I made a small proposal to allow a flexible support for multithreaded
> core while letting the thread manager decide what to do on signal receiving
> (actually whether or not it wants to implements the POSIX semantics):
> The basic idea is to configure for each thread what to do when a core dump
> is required (namely whether to actually dump the core to a file, or to
> write the faulting context in its own thread memory).
>
[mechanism description deleted... refer to original if you wish]

Unfortunately, this won't work so well...

The problem is that when a task receives a trap, it's not processed
immediately; instead, the task recieves the signal, and only when it is
scheduled to run again will the signal truly be processed.

Now, this puts us in a real bind when dealing with coredumps. Between the
moment our misbhaving task got the fatal signal until the manager gets to
act on it, another thread may get a timeslice. It may change data in our
VM, and so the core dump will simply not reflect the real situation.
(And, if our fault was e.g. due to an incorrectly written synchronization
primitive, it's possible we faulted because a data structure was
corrupted--- and we now let the culprit cover its tracks before the
coredump... Ouch.)

[ Hmm... This also means that even the _current_ situation is bad; if a
thread is hit by a fault, another thread may get some time to run before
this thread's coredump is produced! ]

So, we must create the coredump immediately when the fault is detected;
and this (IMO) means it must be done in the kernel, as we don't want to
implement some extremely complex inter-task messaging mechanism to make
sure the manager is the next task which will run.

To sum it up, it appears as if we'll have to use something like the patch
that was recently posted that will produce a real threaded coredump for
all tasks sharing the same mm, and possibly we'll also require a change in
the fault handing logic so that after a force_sig the same task will
_always_ go on running to handle the trap...

Lots of work, not enough time...

-az

------------------------+----------------------------------------------------
. __ | Phone: +972 3 5340753 (home), +972 3 9685882 (work)
_| / | email: alonz@usa.net
/ | /_ Alon Ziv | smail: 33 Ha-Rama St., Ganey Tiqwah 55900, Israel
------------------------+----------------------------------------------------
<<<(((this place reserved for that ultra-wise oneliner I haven't found.)))>>>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/