Re: [PATCH 5/5] coredump: abort core dump piping only due to afatal signal

From: Oleg Nesterov
Date: Sat Feb 16 2013 - 12:06:53 EST


On 02/15, Mandeep Singh Baines wrote:
>
> On Fri, Feb 15, 2013 at 7:01 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > It is not enough and imho not good. Damn, I'll try very much to make the
> > patches on weekend...
> >
> >> - while ((pipe->readers > 1) && (!signal_pending(current))) {
> >> + while ((pipe->readers > 1) && (!fatal_signal_pending(current))) {
> >
> > This turns pipe_wait() belowe into the busy-wait loop if signal_pending().
>
> D'oh. Thanks for catching that.
>
> Fixed in v3 by blocking non-fatal signals.

Doesn't look correct...

> > Not good. And not enough, there are other reasons why coredump can fail
> > if the signal is pending.
>
> What other reasons did you have in mind?

Say, pipe_write() can fail if signal_pending() == T.

> Since applying an earlier version of this patch, truncated/missing
> coredumps are no longer any issue for us.

Sure, this "almost works". But this is doesn't really work.

And more importantly, we should fix another problem, SIGKILL should
really stop the coredumping, and I do not see a simple solution, the
main problem is the races with the exiting threads...

> Could the other reasons be addressed in another patch?

Well. Personally I believe we should fix the problems with signals
first, then add the freezer changes...

> >> wake_up_interruptible_sync(&pipe->wait);
> >> kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> >> pipe_wait(pipe);
> >> + pipe_unlock(pipe);
> >> + try_to_freeze();
> >
> > Oh, yes. One of the problems with coredump/signals is freezer. Not sure
> > what should we do...
> >
> > But if we add try_to_freeze() here, we need to add more try_to_freeze's,
> > think about dumping the huge core on the slow media.
> >
>
> We could add more try_to_freeze()s in the dump_write paths to work
> even better with freezer. Do you see any issues with just adding it
> here for a start. It fixes the non-slow media case.

The only issue is that, again, this change pretends to work but it doesn't ;)
IOW, imho you fix the symptom only.

Lets forget about the slow media, consider the piped coredump (the case
you are trying to fix). Suppose that try_to_freeze_tasks() is in progress,
the user-space coredump handler is already frozen, and the dumping thread
does pipe_write()->pipe_wait().

If only we could change pipe_wait() to do freezable_schedule()...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/