Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic

From: H. Peter Anvin
Date: Fri May 22 2009 - 19:30:31 EST


If there is a driver which relies on locked operations to be atomic with
respect to the I/O subsystem, it needs to use true locks, not LOCK_PREFIX.

An interrupt cannot interrupt between two parts of a lockable
instruction even if it isn't locked (there are non-atomic instructions
in the x86 architecture, but they can never be locked.)

The other thing that you might be seeing is that a locked operation may
be slow enough to keep an otherwise-present race condition from being
triggered.

> That tells us nothing, since the CPU technical details are under NDA.

Have you considered that you might be running into a CPU bug or design
error? There was the out-of-order store bug on the Winchip that needed
workarounds (CONFIG_X86_OOSTORE) that I don't think were ever well
tested and might very well have bitrotted?

> All that can be done in this case is report behavior differences from
> the closest publicly described processor (Pentium-M).
>
> For that purpose, I suggest that a single processor box, with other
> hardware that makes memory access independent of the processor's
> control using a processor older than P-4 is a potential test bed.
> "Other hardware that makes memory access..." I previously termed:
> "buss master DMA" - which is overly specific. It misleads people
> into thinking I am seeing hardware control issues rather than
> non-exclusive memory access.
>
> My earlier comments about taking an interrupt between the memory read
> and the memory write operations is from a different manual than the
> one posted. A manual that only applies to processors older than
> the ones supported by the Linux kernel.
> Sorry, my bad, grabbed the wrong book, posted the correct link (SH).
>
> Until one or more specific usages of the LOCK_PREFIX macro can be
> demonstrated to be incorrect (at least for some of the processors
> using this code) - -
>
> Then making the posted change is a single point change that gives a
> pair of builds (one with, one without) to compare the behavior of on
> the test bed.
>
> It is *not* the preferred change for a general release kernel, the
> preferred change would be one that makes a specific rather than
> general correction.
> Perhaps only for some functions, perhaps only for some of the
> processors that currently select this code.
>
> The observation that executing an unnecessary 'lock' opcode in some
> cases slows down the machine is not felt by myself to be significant
> to duplicating my observations. Note: I have been wrong before.

What makes you draw that conclusion, in particular? A lock prefix
typically slows down the following instruction dramatically, on some
processors by many hundreds of cycles.

> This is as informative as I can make the message.
>
> PS: *not* a single machine failure, tested on five machines, owned
> by four different people, two brands, with different use histories.

What do they have in common?

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/