Re: [PATCH v2] perf/core: Fix data race between perf_event_set_output and perf_mmap_close

From: Yang Jihong
Date: Fri Jul 08 2022 - 22:00:36 EST


Hello,

On 2022/7/6 20:29, Yang Jihong wrote:
Hello,

On 2022/7/5 21:07, Peter Zijlstra wrote:
On Mon, Jul 04, 2022 at 05:26:04PM +0200, Peter Zijlstra wrote:
On Mon, Jul 04, 2022 at 08:00:06PM +0800, Yang Jihong wrote:
Data race exists between perf_event_set_output and perf_mmap_close.
The scenario is as follows:

CPU1                                                       CPU2
perf_mmap_close(event2)
if (atomic_dec_and_test(&event2->rb->mmap_count)  // mmap_count 1 -> 0
detach_rest = true;
ioctl(event1, PERF_EVENT_IOC_SET_OUTPUT, event2)
   perf_event_set_output(event1, event2)
if (!detach_rest)
goto out_put;
list_for_each_entry_rcu(event, &event2->rb->event_list, rb_entry)
ring_buffer_attach(event, NULL)
// because event1 has not been added to event2->rb->event_list,
// event1->rb is not set to NULL in these loops

     ring_buffer_attach(event1, event2->rb)
       list_add_rcu(&event1->rb_entry, &event2->rb->event_list)

The above data race causes a problem, that is, event1->rb is not NULL, but event1->rb->mmap_count is 0.
If the perf_mmap interface is invoked for the fd of event1, the kernel keeps in the perf_mmap infinite loop:

again:
         mutex_lock(&event->mmap_mutex);
         if (event->rb) {
<SNIP>
                 if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
                         /*
                          * Raced against perf_mmap_close() through
                          * perf_event_set_output(). Try again, hope for better
                          * luck.
                          */
                         mutex_unlock(&event->mmap_mutex);
                         goto again;
                 }
<SNIP>

Too tired, must look again tomorrow, little feeback below.

With brain more awake I ended up with the below. Does that work?
I have verified that this patch can solve the problem.

Do I submit this patch? Or do you submit it?

Thanks,
Yang


Yes, I apply the patch on kernel versions 5.10 and mainline,
and it could fixed the problem.

Tested-by: Yang Jihong <yangjihong1@xxxxxxxxxx>

Thanks,
Yang
.