Re: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path as well

From: S
Date: Thu Jan 10 2019 - 03:04:56 EST


Hi Borislav,

On 1/10/2019 3:38 AM, Borislav Petkov wrote:
> On Mon, Jan 07, 2019 at 10:37:24AM +0000, S, Shirish wrote:
>> This patch adds threshold quirk applicable for family 15
> Same issue with "This patch" here.
Agree, have avoided the phrase in both patches that originate from this
discussion.
>
>> in resume path as well, since mce_amd_feature_init()
>> does not have quirks applied when originating from mce_syscore_resume(),
>> resulting in the below message at every successful resume:
>>
>> "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..."
>>
>> Signed-off-by: Shirish S <shirish.s@xxxxxxx>
>> ---
>> arch/x86/kernel/cpu/mce/amd.c | 34 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 34 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 89298c8..27cbf66 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -545,6 +545,34 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
>> return offset;
>> }
>>
>> +void disable_err_thresholding(struct cpuinfo_x86 *c)
>> +{
>> + int i;
>> + u64 hwcr;
>> + bool need_toggle;
>> + u32 msrs[] = {
>> + 0x00000413, /* MC4_MISC0 */
>> + 0xc0000408, /* MC4_MISC1 */
>> + };
>> +
>> + if (c->x86_model >= 0x10 && c->x86_model <= 0x7f) {
> You can save yourself an indentation level by reversing the logic here:
>
> if (c->x86 != 0x15)
> return;
>
> Also, I'm wondering if you simply can't do
>
> if (c->x86_model < 0x10)
> return;
>
> The assumption being that all the models - even after 0x7f - are highly
> unlikely to get MC4_MISC thresholding supported, all of a sudden. Might
> wanna run it by HW guys first though.

Perhaps no need to check the model altogether, this is applicable to
entire family 15, since there is no 8th Gen.

>
>> + rdmsrl(MSR_K7_HWCR, hwcr);
>> +
>> + /* McStatusWrEn has to be set */
>> + need_toggle = !(hwcr & BIT(18));
>> +
>> + if (need_toggle)
>> + wrmsrl(MSR_K7_HWCR, hwcr | BIT(18));
>> +
>> + /* Clear CntP bit safely */
>> + for (i = 0; i < ARRAY_SIZE(msrs); i++)
>> + msr_clear_bit(msrs[i], 62);
>> +
>> + /* restore old settings */
>> + if (need_toggle)
>> + wrmsrl(MSR_K7_HWCR, hwcr);
>> + }
> So you copied the same code from __mcheck_cpu_apply_quirks().
>
> No.
>
> In a first patch, you carve that CntP clearing code in a separate
> function disable_err_thresholding() like you've done before.
Agree, first patch here:Â https://lkml.org/lkml/2019/1/10/70
> Then, in a second patch, you call it from mce/amd.c and you move all the
> family/model checks inside the function so that you have a sole
>
> disable_err_thresholding();
>
> calls where you need them.
>
> You don't have to pass in struct cpuinfo_x86 *c - you can use
> boot_cpu_data in the function.
Agree, second patch here: https://lkml.org/lkml/2019/1/10/72
>
>> +}
>> /* cpu init entry point, called from mce.c with preempt off */
>> void mce_amd_feature_init(struct cpuinfo_x86 *c)
>> {
>> @@ -552,6 +580,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>> unsigned int bank, block, cpu = smp_processor_id();
>> int offset = -1;
>>
>> + /* Disable error thresholding bank in S3 resume path as well,
> What S3 resume path? That's the CPU init path.

I have also put in the commit message, but for discussion sake, there
are 2 code paths leading to mce_amd_feature_init() as below.
1) S5 -> S0: (boot)
secondary_startup_64 -> start_kernel -> identify_boot_cpu ->
identify_cpu ->
mcheck_cpu_init (calls __mcheck_cpu_apply_quirks before) ->
mce_amd_feature_init

2) S3 -> S0: (resume)
syscore_resume -> mce_syscore_resume -> mce_amd_feature_init

I have summarized this sequence after capturing the call trace in both
scenarios.

Its clear that the quirks are not applied during S3-> S0 transition,
hence the patch.

I cannot move the quirk to mce_amd_feature_init() since amd.c is guarded
by CONFIG_X86_MCE_AMD and the quirk should be applicable in case the
config is off via

__mcheck_cpu_apply_quirks() at boot.

> Also, kernel comments style is:
>
> /*
> * A sentence ending with a full-stop.
> * Another sentence. ...
> * More sentences. ...
> */
>
Agree, followed it in the above mentioned patches.

Thanks.

Regards,

Shirish S

> Good luck!
>
> :-)
>
--
Regards,
Shirish S