Re: [this_cpu_xx V8 11/16] Generic support for this_cpu_cmpxchg

From: Mathieu Desnoyers
Date: Tue Jan 05 2010 - 17:29:15 EST


* Christoph Lameter (cl@xxxxxxxxxxxxxxxxxxxx) wrote:
> On Tue, 22 Dec 2009, Mathieu Desnoyers wrote:
>
> > > > I am a bit concerned about the "generic" version of this_cpu_cmpxchg.
> > > > Given that what LTTng needs is basically an atomic, nmi-safe version of
> > > > the primitive (on all architectures that have something close to a NMI),
> > > > this means that it could not switch over to your primitives until we add
> > > > the equivalent support we currently have with local_t to all
> > > > architectures. The transition would be faster if we create an
> > > > atomic_cpu_*() variant which would map to local_t operations in the
> > > > initial version.
> > > >
> > > > Or maybe have I missed something in your patchset that address this ?
> > >
> > > NMI safeness is not covered by this_cpu operations.
> > >
> > > We could add nmi_safe_.... ops?
> > >
> > > The atomic_cpu reference make me think that you want full (LOCK)
> > > semantics? Then use the regular atomic ops?
> >
> > nmi_safe would probably make sense here.
>
> I am not sure how to implement fallback for nmi_safe operations though
> since there is no way of disabling NMIs.
>
> > But given that we have to disable preemption to add precision in terms
> > of trace clock timestamp, I wonder if we would really gain something
> > considerable performance-wise.
>
> Not sure what exactly you attempt to do there.
>
> > I also thought about the design change this requires for the per-cpu
> > buffer commit count pointer which would have to become a per-cpu pointer
> > independent of the buffer structure, and I foresee a problem with
> > Steven's irq off tracing which need to perform buffer exchanges while
> > tracing is active. Basically, having only one top-level pointer for the
> > buffer makes it possible to exchange it atomically, but if we have to
> > have two separate pointers (one for per-cpu buffer, one for per-cpu
> > commit count array), then we are stucked.
>
> You just need to keep percpu pointers that are offsets into the percpu
> area. They can be relocated as needed to the processor specific addresses
> using the cpu ops.
>
> > So given that per-cpu ops limits us in terms of data structure layout, I
> > am less and less sure it's the best fit for ring buffers, especially if
> > we don't gain much performance-wise.
>
> I dont understand how exactly the ring buffer logic works and what you are
> trying to do here.
>
> The ringbuffers are per cpu structures right and you do not change cpus
> while performing operations on them? If not then the per cpu ops are not
> useful to you.

Trying to make a long story short:

This scheme where Steven moves buffers from one CPU to another, he only
performs this operation when some other exclusion mechanism ensures that
the buffer is not used for writing by the CPU when this move operation
is done.

It is therefore correct, and needs local_t type to deal with the data
structure hierarchy vs atomic exchange as I pointed in my email.

Mathieu

>
> If you dont: How can you safely use the local_t operations for the
> ringbuffer logic?
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/