Re: [PATCH] taskstats: fix data-race

From: Christian Brauner
Date: Wed Oct 09 2019 - 07:53:29 EST


On Wed, Oct 09, 2019 at 01:48:27PM +0200, Marco Elver wrote:
> On Wed, 9 Oct 2019 at 13:31, Christian Brauner
> <christian.brauner@xxxxxxxxxx> wrote:
> >
> > When assiging and testing taskstats in taskstats_exit() there's a race
> > when writing and reading sig->stats when a thread-group with more than
> > one thread exits:
> >
> > cpu0:
> > thread catches fatal signal and whole thread-group gets taken down
> > do_exit()
> > do_group_exit()
> > taskstats_exit()
> > taskstats_tgid_alloc()
> > The tasks reads sig->stats without holding sighand lock seeing garbage.
> >
> > cpu1:
> > task calls exit_group()
> > do_exit()
> > do_group_exit()
> > taskstats_exit()
> > taskstats_tgid_alloc()
> > The task takes sighand lock and assigns new stats to sig->stats.
> >
> > Fix this by using smp_load_acquire() and smp_store_release().
> >
> > Reported-by: syzbot+c5d03165a1bd1dead0c1@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: 34ec12349c8a ("taskstats: cleanup ->signal->stats allocation")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
> > Reviewed-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > ---
> > /* v1 */
> > Link: https://lore.kernel.org/r/20191005112806.13960-1-christian.brauner@xxxxxxxxxx
> >
> > /* v2 */
> > Link: https://lore.kernel.org/r/20191006235216.7483-1-christian.brauner@xxxxxxxxxx
> > - Dmitry Vyukov <dvyukov@xxxxxxxxxx>, Marco Elver <elver@xxxxxxxxxx>:
> > - fix the original double-checked locking using memory barriers
> >
> > /* v3 */
> > Link: https://lore.kernel.org/r/20191007110117.1096-1-christian.brauner@xxxxxxxxxx/
> > - Andrea Parri <parri.andrea@xxxxxxxxx>:
> > - document memory barriers to make checkpatch happy
> >
> > /* v4 */
> > - Andrea Parri <parri.andrea@xxxxxxxxx>:
> > - use smp_load_acquire(), not READ_ONCE()
> > - update commit message
>
> Acked-by: Marco Elver <elver@xxxxxxxxxx>
>
> Note that this now looks almost like what I suggested, except the

Right, I think we all just needed to get our heads clear about what
exactly is happening here. This codepath is not a very prominent one. :)

> return at the end of the function is accessing sig->stats again. In
> this case, it seems it's fine assuming sig->stats cannot be written
> elsewhere. Just wanted to point it out to make sure it's considered.

Yes, I considered that but thanks for mentioning it.

Note that this patch has a bug. It should be
smp_load_acquire(&sig->stats) and not smp_load_acquire(sig->stats).
I accidently didn't automatically recompile the patchset after the last
change I made. Andrea thankfully caught this.

Thanks!
Christian