Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

From: David Woodhouse
Date: Wed Dec 29 2021 - 08:55:27 EST


On Wed, 2021-12-29 at 14:18 +0100, Paul Menzel wrote:
> > Or the one in
> > https://lore.kernel.org/lkml/d4cde50b4aab24612823714dfcbe69bc4bb63b60.camel@xxxxxxxxxxxxx
> >
> > which makes it do nothing except prepare all the CPUs before bringing
> > them up one at a time?
>
> I applied it on top the other one, and it made no difference either.

It's possible I missed something else in the prepare stage that doesn't
cope with all CPUs being prepared first.

My next attempt might be to change the loop in bringup_nonboot_cpus()
to bring all the CPUs not to the CPUHP_BP_PARALLEL_DYN state(s) but
instead just bring them to somewhere like CPUHP_RCUTREE_PREP, which is
somewhere in the middle between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU.

Then a binary chop search — if that one boots, try maybe
CPUHP_TOPOLOGY_PREPARE. And if not, try CPUHP_PROFILE_PREPARE. Etc.

> > My current theory (not that I've spent that much time thinking about it
> > in the last week) is that there's something about the existing CPU
> > bringup, possibly a CPU bug or something special about the AMD CPUs,
> > which is triggered by just making it a little bit *faster*, which is
> > why bringing them up from kexec (especially in qemu) can cause it too?
>
> Would having the serial console enabled make a difference?
>
Yes. I couldn't make this fail in my EC2 m6a instance (for clean boots;
I have never managed to kexec it) until I turned off the serial console
to make things go faster.

> > Tom seemed to find that it was in load_TR_desc(), so if you could try
> > this hack on a machine that doesn't magically wink out of existence on
> > a triplefault before even flushing its serial output, that would be
> > much appreciated...

> Unfortunately, no more messages were printed on the serial console.

I suppose we need to litter those outputs somewhere earlier in the
trampoline then, perhaps it *isn't* getting to load_TR_desc() in your
case?

Will be back online properly next week and can actually provide some of
the above suggestions in patch form if you're willing to keep testing.
Thanks!

Attachment: smime.p7s
Description: S/MIME cryptographic signature