Re: Questions about the patch 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it")

From: Jann Horn
Date: Mon Jan 10 2022 - 04:09:59 EST


On Wed, Dec 22, 2021 at 11:32 AM libaokun (A) <libaokun1@xxxxxxxxxx> wrote:
> > From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Date: Wed, 1 Dec 2021 10:06:14 -0800
> > Subject: fget: check that the fd still exists after getting a ref to it
> >
> > Jann Horn points out that there is another possible race wrt Unix domain
> > socket garbage collection, somewhat reminiscent of the one fixed in
> > commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
> >
> > See the extended comment about the garbage collection requirements added
> > to unix_peek_fds() by that commit for details.
> >
> > The race comes from how we can locklessly look up a file descriptor just
> > as it is in the process of being closed, and with the right artificial
> > timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
> > Unix domain socket garbage collector could see the reference count
> > decrement of the close() happen before fget() took its reference to the
> > file and the file was attached onto a new file descriptor.
>
> I analyzed this CVE and tried to reproduce it.
>
> I guess he triggered it like the stack below.
>
>
> close_fd |
> pick_file |
> | __fget_files
> file = files_lookup_fd_rcu(files, fd); |
> |
> rcu_assign_pointer(fdt->fd[fd], NULL);
> filp_close |
> fput |
> | get_file_rcu_many // ned ref>=1
> fput_many(file, 1); |
> file_free(file); |
> | return file
> | // read-after-free

The race is more complicated than that; you also need to add unix_gc()
to the race. And if you want to get to memory corruption, you need one
or two more races involving unix_stream_read_generic() on top of that.

> If you want to successfully execute the get_file_rcu_many function,
>
> the reference counting of the file is greater than or equal to 1 and
>
> is greater than or equal to 2 after the execution.
>
> However, close releases only one reference count and does not release
> the file,
>
> so read-after-free does not occur. So how is the race triggered here?

This bug does not lead to a UAF of the file, it leads to a locking
inconsistency between the unix stream read path and the GC.

> The question has been pondered for a long time without any results.
>
> Could I get more details (e.g. reproduction methods or stacks) from you ?

See https://bugs.chromium.org/p/project-zero/issues/detail?id=2247 for
the original bug report. I'm also working on a more detailed blog
post, but that isn't finished yet.