Re: Multithreaded coredump patch where?

From: Roberto Fichera (kernel@tekno-soft.it)
Date: Thu Dec 19 2002 - 04:24:45 EST


At 08.13 18/12/02 -0800, mgross wrote:

>On Tuesday 17 December 2002 03:05 am, Roberto Fichera wrote:
> > >I haven't rebased the patches I posted back in June for a while now.
> > >
> > >Attached is the patch I posted for the 2.4.18 vanilla kernel. Its a bit
> > >controversial, but it seems to work for a number of folks. Let me know if
> > >you have any troubles re-basing it.
> >
> > Only one hunk failed on include/asm-ia64/elf.h but fixed by hand.
> > Why do you say a bit controversial ? One difference that I have
> > notice is in coredump size after your patch. However seem to be
> > working well for now. I'll try later on a SMP machine.
>
>There are 2 issues with this implementation that will likely not effect you.
>
>
>First, when dumping core of MT applications with LOTS of threads the pthread
>library signals all the threads in the application to exit. Sometimes
>the process that is dumping core fails to suspend other threads in the
>application before some exit. The result of this is that for such
>applications you will not see them in the core file.
>
>You have to work at it to see this failure. The way I reproduce this is to
>run a test application with about 555 pthread threads in it and send it a
>sig_quit. When I look at the core file wont have all 555 threads. SMP makes
>this effect a bit more noticeable.

This could be a problem for me. I haven't tried yet your patch with my SMP
machine but I'll try today, I hope.

>Ingo's design to fix this change the exit path for thread to wait for the
>core file to get dumped before finishing the process clean up. I like this
>approach, I just wish I thought of it ;)

Yes seem to be a good solution.

>Second, the controversial issue is in the way my design pauses the other
>threads in the MT application. Its not semaphore lock safe. Although no
>instance of the following failure has been seen, it is possible with new
>kernel code.
>
>If one of the processes in the MT application is currently holding semaphore
>lock when the dumping process pauses it, AND the dumping process does any
>blocking operation that could attempt to grab this same semaphore, THEN the
>core dump will deadlock. Boom.

This problem could be reproducible under load doing some IO (fs & net),
cpu bound process and swapping (vm IO). In this way we could have some
possibility to catch it.

>My patch is good for developers, pending the back port of Ingo's version.
>
>Do let me know how it works out for you.
>
>--mgross

Roberto Fichera.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Dec 23 2002 - 22:00:23 EST