Re: Patch: CLONE_PPID, CLONE_WAIT, CLONE_SUSPENDED

Eric Paire (e.paire@opengroup.org)
Mon, 02 Aug 1999 08:53:26 +0200


> On Tue, 27 Jul 1999, Eric PAIRE wrote:
>
> [my attribution deleted :-/]
> > > Actually, there _is_ one more unsolved problem: core dumps. When a
> > > thread gets killed by the kernel for doing something illegal (SIGILL /
> > > SIGSEGV / whatever), the complete process is supposed to be dumped. I
> > > still haven't thought up a suitable solution for this...
> > >
> > Last year, after having developed the linuxthread debug support in GDB,
> > I made a small proposal to allow a flexible support for multithreaded
> > core while letting the thread manager decide what to do on signal receiving
> > (actually whether or not it wants to implements the POSIX semantics):
> > The basic idea is to configure for each thread what to do when a core dump
> > is required (namely whether to actually dump the core to a file, or to
> > write the faulting context in its own thread memory).
> >
> [mechanism description deleted... refer to original if you wish]
>
> Unfortunately, this won't work so well...
>
> The problem is that when a task receives a trap, it's not processed
> immediately; instead, the task recieves the signal, and only when it is
> scheduled to run again will the signal truly be processed.
>
> Now, this puts us in a real bind when dealing with coredumps. Between the
> moment our misbhaving task got the fatal signal until the manager gets to
> act on it, another thread may get a timeslice. It may change data in our
> VM, and so the core dump will simply not reflect the real situation.
> (And, if our fault was e.g. due to an incorrectly written synchronization
> primitive, it's possible we faulted because a data structure was
> corrupted--- and we now let the culprit cover its tracks before the
> coredump... Ouch.)
>
> [ Hmm... This also means that even the _current_ situation is bad; if a
> thread is hit by a fault, another thread may get some time to run before
> this thread's coredump is produced! ]
>
If you think in real multiprocessor mode, then if two threads of your program
are running concurrently, then the fact that one is faulting does not impact
at the same time the other, so that the memory can be modified by the other
while the faulty one is handling the fault, which leads to a wrong core. So,
I think there is *NO* real solution for real synchronized faulting thread.

> So, we must create the coredump immediately when the fault is detected;
> and this (IMO) means it must be done in the kernel, as we don't want to
> implement some extremely complex inter-task messaging mechanism to make
> sure the manager is the next task which will run.
>
> To sum it up, it appears as if we'll have to use something like the patch
> that was recently posted that will produce a real threaded coredump for
> all tasks sharing the same mm, and possibly we'll also require a change in
> the fault handing logic so that after a force_sig the same task will
> _always_ go on running to handle the trap...
>
I have nothing against this patch (and really nothing if Linus puts it in
the standard kernel), except that
1) it unconditionally enforces the POSIX thread signal behaviour inside
the kernel (this could be avoided by having a new sysctl() for that).
2) it may mislead people working on a multiprocessor with a core that may not
be the exact memory image when the thread actually faulted. (Is it
acceptable that the kernel produce potential wrong core files ?).

-Eric
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/