Re: [PATCH 4.9 033/120] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code

From: Prarit Bhargava
Date: Thu Jan 19 2017 - 07:19:04 EST




On 01/18/2017 12:20 PM, Greg Kroah-Hartman wrote:
> On Wed, Jan 18, 2017 at 11:55:58AM -0500, Prarit Bhargava wrote:
>>
>>
>> On 01/18/2017 11:33 AM, Greg Kroah-Hartman wrote:
>>> On Wed, Jan 18, 2017 at 09:38:07AM -0500, Prarit Bhargava wrote:
>>>>
>>>>
>>>> On 01/18/2017 05:45 AM, Greg Kroah-Hartman wrote:
>>>>> 4.9-stable review patch. If anyone has any objections, please let me know.
>>>>>
>>>>
>>>> Nack.
>>>>
>>>> The value of boot_cpu_data.logical_proc_id may be uninitialized and set to
>>>> default -1 on systems that pick a random core as boot cpu. This was
>>>> inadvertently fixed by 9d85eb9119f4 ("x86/smpboot: Make logical package
>>>> management more robust") which is in 4.10-rc1.
>>>>
>>>> Before 9d85eb9119f4:
>>>>
>>>> [ 3.971539] hswep_uncore_cpu_init: cpu 5 pkg 0 boot_cpu_data.logical_proc_id
>>>> 65535
>>>> [ 3.976504] hswep_uncore_cpu_init: cpu 5 pkg 0 cpu_data(0).logical_proc_id 0
>>>>
>>>> After 9d85eb9119f4:
>>>>
>>>> [ 3.919112] hswep_uncore_cpu_init: cpu 5 pkg 0 boot_cpu_data.logical_proc_id 0
>>>> [ 3.923391] hswep_uncore_cpu_init: cpu 5 pkg 0 cpu_data(0).logical_proc_id 0
>>>>
>>>> This patch should not be applied to any stable branch.
>>>
>>> So the fixes: line lies? This isn't needed at all for 4.9?
>>
>> No, the fixes: line does not lie. This patch is not needed at all for 4.9.
>> Other patches are required beyond this patch in order for 4.9 to remain stable.
>>
>> This patch is 6d6daa20945f ("perf/x86/intel/uncore: Fix hardcoded socket 0
>> assumption in the Haswell init code") which was tested on and applied to
>> 4.10-rc3 IIRC. This patch was applied to 4.10-rc4.
>>
>> [prarit@prarit linux]$ git describe --contains 6d6daa20945f
>> v4.10-rc4~9^2~5
>
> Yes, but the patch says it is fixing a bug since 4.6-rc1. The fixes
> line says:
> Fixes: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data")
>
> $ git dc cf6d445f6897
> v4.6-rc1~165^2~28
>
> (dc is my git alias for "describe --contains" as I type it so often...)
>
>> 4.9 is broken and requires additional patches beyond this patch. Applying this
>> patch to 4.9 stable without those additional fixes will result in kernel panics
>> on some Haswell systems that boot on random cores.
>
> So, does 4.9 on its own work properly on these systems? If not, what
> are the commits that are needed to fix it?

The system is being used by someone else atm. I know of at least two commits
that are required but I have not tested them on 4.9 or 4.9 stable. I do know
that 4.10-rc2-ish was broken on these systems when booting on any other cpu than
cpu 0.

>
> If 4.9 is fine as-is, great, we should drop this patch then, correct?
> But then that fixes: line lied :(

Let me see if I can explain it this way:

(The commits are displayed as commit hash, version that contains commit, and my
own description)

cf6d445f6897 v4.6-rc1~165^2~28 <- what my commit "Fixes:"

d49597fd3bc v4.9-rc6~33^2 <- required for 9d85eb9119f, causes
boot_cpu_data.logical_proc_id is -1.

9d85eb9119f v4.10-rc1~51^2~8 <- This patch unintentionally fixes code so that
boot_cpu_data.logical_proc_id is greater than or equal to 0.

6d6daa20945f v4.10-rc4~9^2~5 <- my commit, expects boot_cpu_data.logical_proc_id
greater than or equal to 0.



cf6d445f6897 v4.6-rc1~165^2~28 caused the bug to happen.

When I found the bug the upstream tree was at 4.10-rc2-ish. My fix for the bug
is 6d6daa20945f v4.10-rc4~9^2~5 which has "Fixes: cf6d445f6897" because that
commit causes this to happen.

If my commit 6d6daa20945f v4.10-rc4~9^2~5 is applied to 4.9 stable without
9d85eb9119f v4.10-rc1~51^2~8 things will be *worse* because
boot_cpu_data.logical_proc_id is -1, and this will result in a NULL pointer
panic during boot on *all* Haswell systems.

If we take d49597fd3bc, 9d85eb9119f, and 6d6daa20945f things should work but as
I said above, I have not built or tested that. I can test this combination of
patches if you think it is worthwhile for 4.9 stable.

I think the fixes: line is correct. The git bisect points at that commit. The
problem with backporting my patch to 4.9 is that my fix requires other patches
that are not in 4.9.

P.