Re: [PATCH] exit: move exit_task_namespaces() after exit_task_work()

From: Dmitry Vyukov
Date: Fri Dec 15 2017 - 02:43:42 EST


On Fri, Dec 15, 2017 at 7:56 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Cong Wang <xiyou.wangcong@xxxxxxxxx> writes:
>
>> syzbot reported we have a use-after-free when mqueue_evict_inode()
>> is called on __cleanup_mnt() path, where the ipc ns is already
>> freed by the previous exit_task_namespaces(). We can just move
>> it after after exit_task_work() to avoid this use-after-free.
>
> How does that possibly work. (I haven't seen this syzbot report).
>
> Looking at the code we have get_ns_from_inode. Which takes the mq_lock,
> sees if the pointer is NULL and takes a reference if it is non-NULL.
>
> Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held
> when the count drops to zero.
>
> Where is the race in that?
>
> The rest of mqueue_evict_inode uses the returned pointer and
> tests that the pointer is non-NULL before user it.
>
> So either szbot is giving you a bad report or there is a subtle race
> there I am not seeing. The change below is not at all the proper way to
> fix a subtle race.
>
> Eric

Cong, what was that report? Searching by
"exit_task_work|exit_task_namespaces" there are too many of them:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/%22exit_task_work$7Cexit_task_namespaces%22%7Csort:date

I can only say that syzbot does not make up reports. That's something
that actually happened and was provoked by userspace.



>> Reported-by: syzbot <syzkaller@xxxxxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
>> ---
>> kernel/exit.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index 6b4298a41167..909e43c45158 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -861,8 +861,8 @@ void __noreturn do_exit(long code)
>> exit_fs(tsk);
>> if (group_dead)
>> disassociate_ctty(1);
>> - exit_task_namespaces(tsk);
>> exit_task_work(tsk);
>> + exit_task_namespaces(tsk);
>> exit_thread(tsk);
>>
>> /*