Re: [PATCH 4.19 36/55] drivers/net/b44: Change to non-atomic bit operations on pwol_mask

From: Peter Zijlstra
Date: Fri Jan 31 2020 - 09:04:06 EST


On Fri, Jan 31, 2020 at 01:57:31PM +0100, Pavel Machek wrote:
> On Thu 2020-01-30 19:39:17, Greg Kroah-Hartman wrote:
> > From: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> >
> > [ Upstream commit f11421ba4af706cb4f5703de34fa77fba8472776 ]
>
> This is not suitable for stable. It does not fix anything.

It fixes the code for BE at the very least.

> It prepares
> for theoretical bug that author claims might be introduced to BIOS in
> future... I doubt it, even BIOS authors boot their machines from time
> to time.

BIOS authors might not enable this (optional) feature.

> > Atomic operations that span cache lines are super-expensive on x86
> > (not just to the current processor, but also to other processes as all
> > memory operations are blocked until the operation completes). Upcoming
> > x86 processors have a switch to cause such operations to generate a #AC
> > trap. It is expected that some real time systems will enable this mode
> > in BIOS.
>
> And I wonder if this is even good idea for mainline. x86 architecture
> is here for long time, and I doubt Intel is going to break it like
> this. Do you have documentation pointer?

Or you could, you know, like google it. Try "intel split lock
detection". It is a feature the OS can enable which will result in #AC
exceptions when memops to LOCK prefixed instructions are not properly
aligned (because their performance sucks and it impacts execution across
the machine, not just the local CPU).

> > In preparation for this, it is necessary to fix code that may execute
> > atomic instructions with operands that cross cachelines because the #AC
> > trap will crash the kernel.
>
> How does single bit operation "cross cacheline"? How is this going to
> impact non-x86 architectures?

The actual instruction is "LOCK BTSQ", which is a 64bit wide instruction
(LOCK BTSL on 32bit kernels). The memory operand of that instruction is
(stupidly IMO) allowed to be non aligned.

Any sane architecture (ie, pretty much everyone else) will already trap
when you try unaligned atomic ops (or even unaligned anything for most
RISCs).

> > Since "pwol_mask" is local and never exposed to concurrency, there is
> > no need to set bits in pwol_mask using atomic operations.
> >
> > Directly operate on the byte which contains the bit instead of using
> > __set_bit() to avoid any big endian concern due to type cast to
> > unsigned long in __set_bit().
>
> What concerns? Is __set_bit() now useless and are we going to open-code
> it everywhere? Is set_bit() now unusable on x86?

As David already explained, the bitops are defined on long[], are
employed on u8[] here (clue the (unsigned long *) cast) and would do
completely the wrong thing on BE.

set_bit() works as advertised when used as specified; on long[].