Re: spin_unlock optimization(i386)

Erich Boleyn (esboleyn@ichips.intel.com)
Wed, 24 Nov 1999 17:33:01 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andrea Arcangeli: "Re: spin_unlock optimization(i386)"
Previous message: Brad.Larden@solution6.com: "SMP kernel with single processor ?"

> >Uhh, if this is to a cacheable memory type, then the read memory barrier is
> >irrelevant, as all writes by other processors are guaranteed to be observed
> >in order, so, if some other processor stores into 0x20, then 0x0, your
> >program will never execute as if it had gotten the wrong data sequence.
>
> Does this mean that an IA32 SMP system provides _always_ strong ordering
> while accessing RAM in WB or WT caches? On a single CPU we always see
> strong ordering even if the accesses on RAM are speculative, but on SMP on
> most CPUs we have to serialize instructions by hand. In the common code
> (the one we Manfred told) a mb() is definitely necessary as not only IA32
> has to run such code.

IA32 SMP is Processor Ordered for WB memory (why would you use WT for
normal RAM anyway?), which essentially means the same thing as strong
ordering with the following caveat:

Stores from one processor may not all be seen by other processors
yet, though if any one store is observed by processor A on
processor B, any earlier stores (in program order) from processor
A are guaranteed to be observed by processor B.

In loose terms, each processor has a queue of committed stores that hasn't
made it to the outside world yet. Once stores are observed by a processor,
the normal program ordering constraints apply.

IA32 is strongly ordered for UC (uncached) memory.

You could have memory-ordering #defines... I.e. "#ifdef PROCESSOR_ORDERING",
"#ifdef WEAK_ORDERING", etc... in general as you go to the weaker memory
orderings you have to add more code, anyway.

> Assume this piece of code that is basically the wait_event interface:
>
> unsigned long event_happened = 0;
>
> wait_for_event()
> {
> current_task_state = WANT_TO_GO_TO_SLEEP;
> if (!event_happened)
> schedule(); /* schedule will put the task to sleep if
> current_task_state is set to
> WANT_TO_GO_TO_SLEEP or it will
> return immediatly if current_task_state
> is set to RUNNING */
> }
>
> event_callback() /* this is called when the event happens
> and it may run on a CPU different than
> the one where wait_for_event() runs */
> {
> event_happened = 1;
> current_task_state = RUNNING;
> }
>
> The above interface rely on the ordering as written above.
>
> Even the compiler should be allowed to reorder the above instructions as
> the compiler always assumes that the program is single threaded. So for
> example the event_callback() could be implemented in ASM this way without
> a _compiler_ barrier:
...
> I hope I am been clear enough, if not please ask more details.
>
> So at least a barrier() for the compiler between the read/writes is
> necessary also for IA32.

Make the variables "volatile" and the compiler is required by the ANSI
standard to not reorder them around sequence points (which the end-of-
statement delimiter is).

ALL SMP-visible or memory-mapped hardware variables should in general be
"volatile" to make sure you get exactly what you're asking for from the
compiler.

> What is interesting to know is if the IA32 enforce strong ordering in
> software in SMP across multiple CPUs like if both current_task_state and
> even_happened would been allocated on _not_ cachable memory. If that is
> true then it seems to me we can drop some _more_ lock on the bus from the
> IA32 port... (as first the xchg in set_mb() can be removed and replaced
> with a normal write in IA32 for example). NOTE: set_mb() and in turn
> set_current_state() (that uses set_mb()) is still necessary (for other
> archs like Alpha) in order to write correct (SMP safe) common code of
> course.

In IA32, (UC) Uncacheable memory is Strongly Ordered, compared to the
weaker case of Processor Ordering for Writeback (WB) memory (where stores
can be arbitrarily delayed from the other processors). This means you
generally get the appropriately intuitive result when surrounding, for
example, driver reads/writes (reads/writes to I/O space are also strongly
ordered) with normal loads and stores.

Erich Boleyn
PMD IA32 Architecture
Intel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrea Arcangeli: "Re: spin_unlock optimization(i386)"
Previous message: Brad.Larden@solution6.com: "SMP kernel with single processor ?"