Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7

From: Joel Schopp
Date: Wed Jan 20 2010 - 17:36:40 EST

Next message: Lennart Sorensen: "Re: Strange EHCI/OHCI IRQ and other PCI errors on boot with 2.6.32(which 2.6.26 did not see) on Geode LX/CS5536."
Previous message: Jarod Wilson: "[PATCH] staging/crystalhd: fix build on powerpc"
In reply to: Benjamin Herrenschmidt: "Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

+
+static inline int thread_in_smt4core(int x)
+{
+ return x % 4;
+}

Needs a whitespace here though I don't really like the above. Any reason
why you can't use the existing cpu_thread_in_core() ?

I will change it to cpu_thread_in_core()

+unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
+{
+ int cpu2;
+ int idle_count = 0;
+
+ struct cpumask *cpu_map = sched_domain_span(sd);
+
+ unsigned long weight = cpumask_weight(cpu_map);
+ unsigned long smt_gain = sd->smt_gain;

More whitespace damage above.

You are better than checkpatch.pl, will fix.

+ if (cpu_has_feature(CPU_FTRS_POWER7) && weight == 4) {
+ for_each_cpu(cpu2, cpu_map) {
+ if (idle_cpu(cpu2))
+ idle_count++;
+ }

I'm not 100% sure about the use of the CPU feature above. First I wonder
if the right approach is to instead do something like

if (!cpu_has_feature(...) !! weigth < 4)
return default_scale_smt_power(sd, cpu);

Though we may be better off using a ppc_md. hook here to avoid
calculating the weight etc... on processors that don't need any
of that.

I also dislike your naming. I would suggest you change cpu_map to
sibling_map() and cpu2 to sibling (or just c). One thing I wonder is how
sure we are that sched_domain_span() is always going to give us the
threads btw ? If we introduce another sched domain level for NUMA
purposes can't we get confused ?

Right now it's 100% always giving us threads. My development version of the patch had a BUG_ON() to check this. I expect this to stay the case in the future as the name of the function is arch_scale_smt_power(), which clearly denotes threads are expected.

I am not stuck on the names, I'll change it to sibling instead of cpu2 and sibling_map instead of cpu_map. It seems clear to me either way.

As for testing the ! case it seems funcationally equivalent, and mine seems less confusing.

Having a ppc.md hook with exactly 1 user is pointless, especially since you'll still have to calculate the weight with the ability to dynamically disable smt.

Also, how hot is this code path ?

It's every load balance, which is to say not hot, but fairly frequent. I haven't been able to measure an impact from doing very hairy calculations (without actually changing the weights) here vs not having it at all in actual end workloads.

+ /* the following section attempts to tweak cpu power based
+ * on current idleness of the threads dynamically at runtime
+ */
+ if (idle_count == 2 || idle_count == 3 || idle_count == 4) {

if (idle_count > 1) ? :-)

Yes :) Originally I had done different weightings for each of the 3 cases, which gained in some workloads but regressed some others. But since I'm not doing that anymore I'll fold it down to > 1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Lennart Sorensen: "Re: Strange EHCI/OHCI IRQ and other PCI errors on boot with 2.6.32(which 2.6.26 did not see) on Geode LX/CS5536."
Previous message: Jarod Wilson: "[PATCH] staging/crystalhd: fix build on powerpc"
In reply to: Benjamin Herrenschmidt: "Re: [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]