Re: [patch] tlb flush_data: replace per_cpu with an array

From: Ingo Molnar
Date: Mon Jan 12 2009 - 18:01:25 EST



* Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:

> Hi Frederik,
>
> On Mon, Jan 12, 2009 at 10:35:42PM +0100, Frederik Deweerdt wrote:
> >
> >Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xxxxxxxx>
> >
> >diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
> >index f8be6f1..c177a1f 100644
> >--- a/arch/x86/kernel/tlb_64.c
> >+++ b/arch/x86/kernel/tlb_64.c
> >@@ -33,7 +33,7 @@
> > * To avoid global state use 8 different call vectors.
> > * Each CPU uses a specific vector to trigger flushes on other
> > * CPUs. Depending on the received vector the target CPUs look into
> >- * the right per cpu variable for the flush data.
> >+ * the right array slot for the flush data.
> > *
> > * With more than 8 CPUs they are hashed to the 8 available
> > * vectors. The limited global vector space forces us to this right now.
> >@@ -54,7 +54,7 @@ union smp_flush_state {
> > /* State is put into the per CPU data section, but padded
> > to a full cache line because other CPUs can access it and we don't
> > want false sharing in the per cpu data segment. */
> >-static DEFINE_PER_CPU(union smp_flush_state, flush_state);
> >+static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
>
> Since flush_state has a spinlock, an array of flush_states would end up
> having multiples spinlocks on the same L1 cacheline. However, I see
> that the union smp_flush_state is already padded to SMP_CACHE_BYTES.

Correct.

> With per-cpu areas, locks belonging to different array elements did not
> end up on the same internode cacheline due to per-node allocation of
> per-cpu areas, but with a simple array, two locks could end up on the
> same internode cacheline.
>
> Hence, can you please change the padding in smp_flush_state to
> INTERNODE_CACHE_BYTES (you have to derive this off
> INTERNODE_CACHE_SHIFT) and align it using
> ____cacheline_internodealigned_in_smp?

Yes, that matters to vsmp, which has ... 4K node cache-lines.

Note that there's already CONFIG_X86_INTERNODE_CACHE_BYTES which is set
properly on vsmp.

So ... something like the commit below?

Ingo

--------------->