Re: [PATCH] exit: move exit_task_namespaces() after exit_task_work()

From: Eric W. Biederman
Date: Fri Dec 15 2017 - 01:57:08 EST


Cong Wang <xiyou.wangcong@xxxxxxxxx> writes:

> syzbot reported we have a use-after-free when mqueue_evict_inode()
> is called on __cleanup_mnt() path, where the ipc ns is already
> freed by the previous exit_task_namespaces(). We can just move
> it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock,
sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held
when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and
tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race
there I am not seeing. The change below is not at all the proper way to
fix a subtle race.

Eric


>
> Reported-by: syzbot <syzkaller@xxxxxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
> ---
> kernel/exit.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 6b4298a41167..909e43c45158 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -861,8 +861,8 @@ void __noreturn do_exit(long code)
> exit_fs(tsk);
> if (group_dead)
> disassociate_ctty(1);
> - exit_task_namespaces(tsk);
> exit_task_work(tsk);
> + exit_task_namespaces(tsk);
> exit_thread(tsk);
>
> /*