Re: [PATCH 2/2] perf/x86/amd: Don't allow pre-emption in amd_pmu_lbr_reset()

From: Peter Zijlstra
Date: Tue Oct 24 2023 - 12:00:22 EST


On Tue, Oct 24, 2023 at 10:32:27AM -0500, Mario Limonciello wrote:
> On 10/24/2023 03:02, Ingo Molnar wrote:
> >
> > * Mario Limonciello <mario.limonciello@xxxxxxx> wrote:
> >
> > > Fixes a BUG reported during suspend to ram testing.
> > >
> > > ```
> > > [ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948
> > > [ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0
> > > ```
> > >
> > > Cc: stable@xxxxxxxxxxxxxxx # 6.1+
> > > Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
> > > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > > ---
> > > arch/x86/events/amd/lbr.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
> > > index eb31f850841a..5b98e8c7d8b7 100644
> > > --- a/arch/x86/events/amd/lbr.c
> > > +++ b/arch/x86/events/amd/lbr.c
> > > @@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event)
> > > void amd_pmu_lbr_reset(void)
> > > {
> > > - struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > + struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events);
> > > int i;
> > > if (!x86_pmu.lbr_nr)
> > > @@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void)
> > > cpuc->last_task_ctx = NULL;
> > > cpuc->last_log_id = 0;
> > > + put_cpu_ptr(&cpu_hw_events);
> > > wrmsrl(MSR_AMD64_LBR_SELECT, 0);
> > > }
> >
> > Weird, amd_pmu_lbr_reset() is called from these places:
> >
> > - amd_pmu_lbr_sched_task(): during task sched-in during
> > context-switching, this should already have preemption disabled.
> >
> > - amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add
> > (amd_pmu_add_event()), called by event_sched_in(), which too should have
> > preemption disabled.
> >
> > I clearly must have missed some additional place it gets called in.
> >
> > Could you please cite the full log of the amd_pmu_lbr_reset() call that
> > caused the critical section warning?
> >
> > Thanks,
> >
> > Ingo
>
> Below is the call trace in case you think it's better to disable preemption
> by the caller instead. If you think it's better to keep it in
> amd_pmu_lbr_reset() I'll add this trace to the commit message.

You cut too much; what task is running this?

IIRC this is the hotplug thread running a teardown function on that CPU
itself. It being a strict per-cpu thread should not trip
smp_processor_id() wanrs.

>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x44/0x60
> check_preemption_disabled+0xce/0xf0
> ? __pfx_x86_pmu_dead_cpu+0x10/0x10
> amd_pmu_lbr_reset+0x19/0xc0
> ? __pfx_x86_pmu_dead_cpu+0x10/0x10
> amd_pmu_cpu_reset.constprop.0+0x51/0x60
> amd_pmu_cpu_dead+0x3e/0x90
> x86_pmu_dead_cpu+0x13/0x20
> cpuhp_invoke_callback+0x169/0x4b0
> ? __pfx_virtnet_cpu_dead+0x10/0x10
> __cpuhp_invoke_callback_range+0x76/0xe0
> _cpu_down+0x112/0x270
> freeze_secondary_cpus+0x8e/0x280
> suspend_devices_and_enter+0x342/0x900
> pm_suspend+0x2fd/0x690
> state_store+0x71/0xd0
> kernfs_fop_write_iter+0x128/0x1c0
> vfs_write+0x2db/0x400
> ksys_write+0x5f/0xe0
> do_syscall_64+0x59/0x90
>