Re: [Patch V1 1/3] x86, mce: MCE log size not enough for high core parts

From: Borislav Petkov
Date: Fri Sep 25 2015 - 04:29:34 EST


+ x86@xxxxxxxxxx

On Thu, Sep 24, 2015 at 02:25:41PM -0700, Raj, Ashok wrote:
> Hi Boris
>
> I should have expanded on it..
>
> On Thu, Sep 24, 2015 at 11:07:33PM +0200, Borislav Petkov wrote:
> >
> > How are you ever going to call into those from an offlined CPU?!
> >
> > And that's easy:
> >
> > if (!cpu_online(cpu))
> > return;
> >
>
> The last patch of that series had 2 changes.
>
> 1. Allow offline cpu's to participate in the rendezvous. Since in the odd
> chance the offline cpus have any errors collected we can still report them.
> (we changed mce_start/mce_end to use cpu_present_mask instead of just
> online map).

This is not necessarily wrong - it is just unusual.

> Without this change today if i were to inject an broadcast MCE
> it ends up hanging, since the offline cpu is also incrementing mce_callin.
> It will always end up more than cpu_online_mask by the number of cpu's
> logically offlined

Yeah, I'd like to have a bit somewhere which says "don't report MCEs on this
core." But we talked about this already.

> Consider for e.g. if 2 thread of the core are offline. And the MLC picks up

What is MLC?

> an error. Other cpus in the socket can't access them. Only way is to let those
> CPUs read and report their own banks as they are core scoped. In upcoming CPUs
> we have some banks that can be thread scoped as well.
>
> Its understood OS doesn't execute any code on those CPUs. But SMI can still
> run on them, and could collect errors that can be logged.

Well, that is not our problem, is it?

I mean, SMM wants to stay undetected. When all of a sudden offlined
cores start reporting MCEs, that's going to raise some brows.

Regardless, there are other reasons why offlined cores might report MCEs
- the fact that logical cores share functional units and data flow goes
through them might trip the reporting on those cores. Yadda yadda...

> 2. If the cpu is offline, we copied them to mce_log buffer, and them copy
> those out from the rendezvous master during mce_reign().
>
> If we were to replace this mce_log_add() with gen_pool_add(), then i would
> have to call mce_gen_pool_add() from the offline CPU. This will end up calling
> RCU functions.
>
> We don't want to leave any errors reported by the offline CPU for purpose
> of logging. It is rare, but still interested in capturing those errors if they
> were to happen.
>
> Does this help?

So first of all, we need to hold this down somewhere, maybe in
Documentation/ to explain why we're running on offlined cores. This is
certainly unusual code and people will ask WTF is going on there.

Then, I really really don't like a static buffer which we will have
to increase with each new bigger machine configuration. This is just
clumsy.

It'd be probably much better to make that MCE buffer per CPU. We can
say, we're allowed to log 2-3, hell, 5 errors in it and when we're done
with the rendezvous, an online core goes and flushes out the error
records to gen_pool.

This scales much better than any artificial MCE_LOG_LEN size.

Oh, and we either overwrite old errors when we fill up the percpu buffer
or we return. that's something we can discuss later. Or we come up with
a bit smarter strategy of selecting which ones to overwrite.

Just artificially increasing a static buffer is not good design IMO.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/