Re: [PATCH v2] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd

From: Michal Hocko
Date: Wed Aug 31 2016 - 02:42:26 EST


On Wed 24-08-16 17:37:16, Michal Hocko wrote:
> On Wed 24-08-16 17:32:00, Oleg Nesterov wrote:
> > On 08/24, Michal Hocko wrote:
> > >
> > > Sounds better?
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index b89f0eb99f0a..ddde5849df81 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -914,7 +914,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> > >
> > > /*
> > > * Signal userspace if we're not exiting with a core dump
> > > - * or a killed vfork parent which shouldn't touch this mm.
> > > + * because we want to leave the value intact for debugging
> > > + * purposes.
> > > */
> > > if (tsk->clear_child_tid) {
> > > if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> >
> > Yes, thanks Michal!
> >
> > Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx>
>
> OK, thanks.

ping

> ---
> From 39cad7842660e0261c27f75702d49458a1f3cea1 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxxx>
> Date: Mon, 30 May 2016 20:20:32 +0200
> Subject: [PATCH] kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
>
> fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID
> to clear the nscd_certainly_running flag in the shared databases, so
> that the clients are notified when nscd is restarted. Now, when nscd
> uses a non-persistent database, clients that have it mapped keep
> thinking the database is being updated by nscd, when in fact nscd has
> created a new (anonymous) one (for non-persistent databases it uses an
> unlinked file as backend).
>
> The original proposal for the CLONE_CHILD_CLEARTID change claimed
> (https://lkml.org/lkml/2006/10/25/233):
> "
> The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
> on behalf of pthread_create() library calls. This feature is used to
> request that the kernel clear the thread-id in user space (at an address
> provided in the syscall) when the thread disassociates itself from the
> address space, which is done in mm_release().
>
> Unfortunately, when a multi-threaded process incurs a core dump (such as
> from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
> the other threads, which then proceed to clear their user-space tids
> before synchronizing in exit_mm() with the start of core dumping. This
> misrepresents the state of process's address space at the time of the
> SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
> problems (misleading him/her to conclude that the threads had gone away
> before the fault).
>
> The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
> core dump has been initiated.
> "
>
> The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
> seems to have a larger scope than the original patch asked for. It seems
> that limitting the scope of the check to core dumping should work for
> SIGSEGV issue describe above.
>
> [Changelog partly based on Andreas' description]
> Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
> Tested-by: William Preston <wpreston@xxxxxxxx>
> Cc: Roland McGrath <roland@xxxxxxxxxxxxx>
> Cc: Andreas Schwab <schwab@xxxxxxxx>
> Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> kernel/fork.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 52e725d4a866..ddde5849df81 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -913,14 +913,12 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> deactivate_mm(tsk, mm);
>
> /*
> - * If we're exiting normally, clear a user-space tid field if
> - * requested. We leave this alone when dying by signal, to leave
> - * the value intact in a core dump, and to save the unnecessary
> - * trouble, say, a killed vfork parent shouldn't touch this mm.
> - * Userland only wants this done for a sys_exit.
> + * Signal userspace if we're not exiting with a core dump
> + * because we want to leave the value intact for debugging
> + * purposes.
> */
> if (tsk->clear_child_tid) {
> - if (!(tsk->flags & PF_SIGNALED) &&
> + if (!(tsk->signal->flags & SIGNAL_GROUP_COREDUMP) &&
> atomic_read(&mm->mm_users) > 1) {
> /*
> * We don't check the error code - if userspace has
> --
> 2.8.1
>
> --
> Michal Hocko
> SUSE Labs

--
Michal Hocko
SUSE Labs