Re: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL

From: Liang, Kan
Date: Wed May 27 2020 - 12:02:59 EST

Next message: Johannes Weiner: "Re: [PATCH 04/12] mm: add support for async page locking"
Previous message: Greg Kroah-Hartman: "Re: Linux 5.6.15"
In reply to: David Laight: "RE: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5/27/2020 11:17 AM, David Laight wrote:

From: Liang, Kan

Sent: 27 May 2020 16:01
On 5/27/2020 10:51 AM, David Laight wrote:

From: Liang, Kan

Sent: 27 May 2020 15:47
On 5/27/2020 8:59 AM, David Laight wrote:

From: kan.liang@xxxxxxxxxxxxxxx

Sent: 27 May 2020 13:31

From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

When counting IMC uncore events on some TGL machines, an oops will be
triggered.
[ 393.101262] BUG: unable to handle page fault for address:
ffffb45200e15858
[ 393.101269] #PF: supervisor read access in kernel mode
[ 393.101271] #PF: error_code(0x0000) - not-present page

Current perf uncore driver still use the IMC MAP SIZE inherited from
SNB, which is 0x6000.
However, the offset of IMC uncore counters for some TGL machines is
larger than 0x6000, e.g. 0xd8a0.

Enlarge the IMC MAP SIZE for TGL to 0xe000.

Replacing one 'random' constant with a different one
doesn't seem like a proper fix.

Surely the actual bounds of the 'memory' area are properly
defined somewhere.
Or at least should come from a table.

You also need to verify that the offsets are within the mapped area.
An unexpected offset shouldn't try to access an invalid address.

Thanks for the review.

I agree that we should add a check before mapping the area to prevent
the issue happens again.

I think the check should be a generic check for all platforms which try
to map an area, not just for TGL. I will submit a separate patch for the
check.

You need a check that the actual access is withing the mapped area.
So instead of getting an OOPS you get a error.

This is after you've mapped it.

Sure. Will add a WARN_ONCE() before the actual access.

No that will still panic some systems.
pr_warn() is all you need.

If we print a warning for each access, there will be too many warnings.
I think I will use pr_warn_once instead.

Thanks,
Kan

Next message: Johannes Weiner: "Re: [PATCH 04/12] mm: add support for async page locking"
Previous message: Greg Kroah-Hartman: "Re: Linux 5.6.15"
In reply to: David Laight: "RE: [PATCH] perf/x86/intel/uncore: Fix oops when counting IMC uncore events on some TGL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]