Re: [REGRESSION] bisected: perf: hang when using async-profiler caused by perf: Fix the POLL_HUP delivery breakage

From: Mi, Dapeng

Date: Mon Oct 13 2025 - 04:38:21 EST



On 10/13/2025 4:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:
>
>> It looks the issue described in the link
>> (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@xxxxxxxxxxxxxxx/T/#u)
>> happens again but in a different way. :(
>>
>> As the commit message above link described,  cpu-clock (and task-clock) is
>> a specific SW event which rely on hrtimer. The hrtimer handler calls
>> __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and
>> eventually call hrtimer_cancel() which traps into a dead loop which waits
>> for the calling hrtimer handler finishes.
>>
>> As the
>> change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@xxxxxxxxxxxxxxx/T/#u),
>> it should be enough to just disable the event and don't need an extra event
>> stop.
>>
>> @Octavia, could you please check if the change below can fix this issue?
>> Thanks.
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7541f6f85fcb..883b0e1fa5d3 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event
>> *event,
>>                 ret = 1;
>>                 event->pending_kill = POLL_HUP;
>>                 perf_event_disable_inatomic(event);
>> -               event->pmu->stop(event, 0);
>> +
>> +               /*
>> +                * The cpu-clock and task-clock are two special SW events,
>> +                * which rely on the hrtimer. The __perf_event_overflow()
>> +                * is invoked from the hrtimer handler for these 2 events.
>> +                * Avoid to call event_stop()->hrtimer_cancel() for these
>> +                * 2 events since hrtimer_cancel() waits for the hrtimer
>> +                * handler to finish, which would trigger a deadlock.
>> +                * Only disabling the events is enough to stop the hrtimer.
>> +                * See perf_swevent_cancel_hrtimer().
>> +                */
>> +               if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
>> +                   event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
>> +                       event->pmu->stop(event, 0);
> This is broken though; you cannot test config without first knowing
> which PMU you're dealing with.

Ah, yes. Just ignore this.


>
> Also, that timer really should get stopped, we can't know for certain
> this overflow is of the timer itself or not, it could be a related
> event.
>
> Something like the below might do -- but please carefully consider the
> cases where hrtimer_try_to_cancel() might fail; in those cases we'll
> have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and
> NORESTART.
>
> But I didn't check all the details.

The only reason that hrtimer_try_to_cancel() could fail is that the hrtimer
callback is currently executing, so current change should be fine. 


>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 820127536e62..a91481d57841 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>
> event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>
> - if (event->state != PERF_EVENT_STATE_ACTIVE)
> + if (event->state != PERF_EVENT_STATE_ACTIVE ||
> + event->hw.state & PERF_HES_STOPPED)
> return HRTIMER_NORESTART;
>
> event->pmu->read(event);
> @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
> ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
> local64_set(&hwc->period_left, ktime_to_ns(remaining));
>
> - hrtimer_cancel(&hwc->hrtimer);
> + hrtimer_try_to_cancel(&hwc->hrtimer);
> }
> }
>
> @@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event)
>
> static void cpu_clock_event_start(struct perf_event *event, int flags)
> {
> + event->hw.state = 0;
> local64_set(&event->hw.prev_count, local_clock());
> perf_swevent_start_hrtimer(event);
> }
>
> static void cpu_clock_event_stop(struct perf_event *event, int flags)
> {
> + event->hw.state = PERF_HES_STOPPED;
> perf_swevent_cancel_hrtimer(event);
> if (flags & PERF_EF_UPDATE)
> cpu_clock_event_update(event);

Besides cpu-clock, task-clock should need similar change as well. I would
post a complete change later.