Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()

From: Mathieu Desnoyers
Date: Fri Nov 07 2008 - 15:07:57 EST


* Peter Zijlstra (a.p.zijlstra@xxxxxxxxx) wrote:
> On Fri, 2008-11-07 at 14:18 -0500, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > >
> > > On Fri, 7 Nov 2008, Mathieu Desnoyers wrote:
> > > >
> > > > __m_cnt_hi
> > > > is read before
> > > > mmio cnt_lo read
> > > >
> > > > for the detailed reasons explained in my previous discussion with
> > > > Nicolas here :
> > > > http://lkml.org/lkml/2008/10/21/1
> > > >
> > > > I use smp_rmb() to do this on SMP systems (hrm, actually, a rmb() could
> > > > be required so it works also on UP systems safely wrt interrupts).
> > >
> > > smp_rmb turns into a compiler barrier on UP and should prevent the below
> > > description.
> > >
> >
> > Ah, right, preserving program order on UP should be enough. smp_rmb()
> > then.
>
>
> I'm not quite sure I'm following here. Is this a global hardware clock
> you're reading from multiple cpus, if so, are you sure smp_rmb() will
> indeed be enough to sync the read?
>
> (In which case the smp_wmb() is provided by the hardware increasing the
> clock?)
>
> If these are per-cpu clocks then even in the smp case we'd be good with
> a plain barrier() because you'd only ever want to read your own cpu's
> clock (and have a separate __m_cnt_hi per cpu).
>
> Or am I totally missing out on something?
>

This is the global hardware clock scenario.

We have to order an uncached mmio read wrt a cached variable read/write.
The uncached mmio read vs smp_rmb() barrier (e.g. lfence instruction)
should be insured by program order because the read will skip the cache
and go directly to the bus. Luckily we only do a mmio read and no mmio
write, so mmiowb() is not required.

You might be right in that it could require more barriers.

Given adequate program order, we can assume the the mmio read will
happen "on the spot", but that the cached read may be delayed.

What we want is :

readl(io_addr)
read __m_cnt_hi
write __m_cnt_hi

With the two reads in the correct order. If we consider two consecutive
executions on the same CPU :

readl(io_addr)
read __m_cnt_hi
write __m_cnt_hi

readl(io_addr)
read __m_cnt_hi
write __m_cnt_hi

We might have to order the read/write pair wrt the following readl, such
as :

smp_rmb(); /* Waits for every cached memory reads to complete */
readl(io_addr);
barrier(); /* Make sure the compiler leaves mmio read before cached read */
read __m_cnt_hi
write __m_cnt_hi

smp_rmb(); /* Waits for every cached memory reads to complete */
readl(io_addr)
barrier(); /* Make sure the compiler leaves mmio read before cached read */
read __m_cnt_hi
write __m_cnt_hi

Would that make more sense ?

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/