Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

From: Thomas Gleixner
Date: Sat Jun 23 2018 - 20:55:38 EST


On Sat, 23 Jun 2018, Fenghua Yu wrote:
> On Sat, Jun 23, 2018 at 11:17:03AM +0200, Thomas Gleixner wrote:
> > On Fri, 22 Jun 2018, Fenghua Yu wrote:
> > > Should I add kernel parameter or control knob to opt-out the feature?
> >
> > A simple command line option 'acoff' or something more sensible should be
> > ok. No sysfs knobs or whatever please. The Kconfig option is not required
> > either.
>
> Ok. I will have a command line option.
>
> BTW, I have a Kconfig option to enable split lock test in kernel mode in
> patch #15. Are the Kconfig option and the kernel test code still needed
> in next version?

Unless you do not trust #AC to work everywhere where it is advertised it's
pretty much pointless.

Btw, please get also rid of these bloated control_ac() stuff. We have
msr_set/clear_bit() so no need to reinvent the wheel.

> > > I'm afraid firmware may hang system after handling split lock if the
> > > feature is enabled by kernel, e.g. "reboot" hits split lock in firmware
> > > and firmware hangs the system after handling #AC.
> >
> > Have you observed the problem in reality? I mean why would 'reboot' be the
> > critical path? I'd rather expect that EFI callbacks or SMM 'value add'
> > would trip over it.
> >
> > Vs. reboot. If that is the only problem then we might just have to clear
> > #AC enable before issuing it, but that does not need to be part of the
> > initial patch set. Its an orthogonal issue.
>
> Yes, I do see a real firmware hang after hitting and handling a split lock
> in firmware during "reboot" in one simulation test environment. Apprantly
> the split lock (and alignment access) is treated as a failure in firmware.

It's not treated as failure. The firmware simply does not have an handler
for #AC installed and dies. I hope you yelled at the firmware people
already.

> This real case triggered my concern that split lock in any future
> firmware may happen in any path including run time service, S3/S4/S5,
> hotplug. If we don't have opt-out option or something similar, system hang
> from split lock in firmware can be a blocking issue on some platforms. If
> that happens, bisect always finds the split lock patch to blame.

That's fine. The changelog will hopefully explain it along with the text
that people should use the commandline option and yell at their firmware
supplier. So what? Move on....

If that is a real wide spread issue in practice, then we might have to go
for some ugly workarounds, but we won't find out when we add them
upfront. So testing will show what's wrong in firmware land and we can
handle it from there. It's a completely orthogonal issue and has nothing to
do with the core functionality.

Thanks,

tglx