Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as thedefault percpu allocator

From: Christoph Lameter
Date: Wed Apr 01 2009 - 22:20:49 EST


On Wed, 1 Apr 2009, Ingo Molnar wrote:

> > __read_mostly should be packed as tightly as possible to increase
> > the chance that one cacheline includes multiple of the critical
> > variables for the hot code paths. Too much __read_mostly defeats
> > its purpose.
>
> That stance is a commonly held but quite wrong and harmful IMHO.

Well that is the reason I introduced __read_mostly.

> It stiffles the proper identification of read-mostly variables _AND_
> it hurts the proper identification of critical write-often variables
> as well. Not good.

Well then lets create another way of annotating variables that does not
move them into a separate section.

> The solution for critical write-often variables is what we always
> used: to identify them explicitly and to place them consciously into
> separate cachelines. (Or to per-cpu-ify or object-ify them where
> possible/sensible.)

Right. But there are none here.

> Then annotate everything that is read-mostly and accessed-frequently
> with the __read_mostly attribute.

None of that is the case here. These are rarely used variables for
allocation and free of percpu variables.

> - Thinking that this solves false cacheline sharing reliably is
> wrong: there's nothing that guarantees and enforces that slapping
> a few variables between two critical variables puts them on
> separate cachelines:

__read_mostly reduces cacheline bouncing problems significantly by saying
that these variables are rarely updated and frequently used in critical
paths. Thus the special placement.

> - It actually prevents true read-mostly variables from being
> annotated properly. (In such a case a true read-mostly variable
> bouncing around with a frequently-written variable cache line is
> almost as bad in terms of MESI latencies and costs as false
> cacheline sharing between two write-mostly variables.)

What I often thought we need is another per cpu read mostly section that
is always local NUMA wise. This means the percpu read mostly section would
be replicated per node. The update of such a read mostly variable would
then take a loop over all these per node segments which would be more
expensive. However, reads would always be local and thus it would be an
advantage.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/