Re: [LKP] [mm] 3484b2de949: -46.2% aim7.jobs-per-min

From: Huang Ying
Date: Fri Mar 27 2015 - 04:49:27 EST


On Wed, 2015-03-25 at 10:54 +0000, Mel Gorman wrote:
> On Mon, Mar 23, 2015 at 04:46:21PM +0800, Huang Ying wrote:
> > > My attention is occupied by the automatic NUMA regression at the moment
> > > but I haven't forgotten this. Even with the high client count, I was not
> > > able to reproduce this so it appears to depend on the number of CPUs
> > > available to stress the allocator enough to bypass the per-cpu allocator
> > > enough to contend heavily on the zone lock. I'm hoping to think of a
> > > better alternative than adding more padding and increasing the cache
> > > footprint of the allocator but so far I haven't thought of a good
> > > alternative. Moving the lock to the end of the freelists would probably
> > > address the problem but still increases the footprint for order-0
> > > allocations by a cache line.
> >
> > Any update on this? Do you have some better idea? I guess this may be
> > fixed via putting some fields that are only read during order-0
> > allocation with the same cache line of lock, if there are any.
> >
>
> Sorry for the delay, the automatic NUMA regression took a long time to
> close and it potentially affected anybody with a NUMA machine, not just
> stress tests on large machines.
>
> Moving it beside other fields shifts the problems. The lock is related
> to the free areas so it really belongs nearby and from my own testing,
> it does not affect mid-sized machines. I'd rather not put the lock in its
> own cache line unless we have to. Can you try the following untested patch
> instead? It is untested but builds and should be safe.
>
> It'll increase the footprint of the page allocator but so would padding.
> It means it will contend with high-order free page breakups but that
> is not likely to happen during stress tests. It also collides with flags
> but they are relatively rarely updated.
>
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f279d9c158cd..2782df47101e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -474,16 +474,15 @@ struct zone {
> unsigned long wait_table_bits;
>
> ZONE_PADDING(_pad1_)
> -
> - /* Write-intensive fields used from the page allocator */
> - spinlock_t lock;
> -
> /* free areas of different sizes */
> struct free_area free_area[MAX_ORDER];
>
> /* zone flags, see below */
> unsigned long flags;
>
> + /* Write-intensive fields used from the page allocator */
> + spinlock_t lock;
> +
> ZONE_PADDING(_pad2_)
>
> /* Write-intensive fields used by page reclaim */

Stress page allocator tests here shows that the performance restored to
its previous level with the patch above. I applied your patch on lasted
upstream kernel. Result is as below:

testbox/testcase/testparams: brickland1/aim7/performance-6000-page_test

c875f421097a55d9 dbdc458f1b7d07f32891509c06
---------------- --------------------------
%stddev %change %stddev
\ | \
84568 Â 1% +94.3% 164280 Â 1% aim7.jobs-per-min
2881944 Â 2% -35.1% 1870386 Â 8% aim7.time.voluntary_context_switches
681 Â 1% -3.4% 658 Â 0% aim7.time.user_time
5538139 Â 0% -12.1% 4867884 Â 0% aim7.time.involuntary_context_switches
44174 Â 1% -46.0% 23848 Â 1% aim7.time.system_time
426 Â 1% -48.4% 219 Â 1% aim7.time.elapsed_time
426 Â 1% -48.4% 219 Â 1% aim7.time.elapsed_time.max
468 Â 1% -43.1% 266 Â 2% uptime.boot
13691 Â 0% -24.2% 10379 Â 1% softirqs.NET_RX
931382 Â 2% +24.9% 1163065 Â 1% softirqs.RCU
407717 Â 2% -36.3% 259521 Â 9% softirqs.SCHED
19690372 Â 0% -34.8% 12836548 Â 1% softirqs.TIMER
2442 Â 1% -28.9% 1737 Â 5% vmstat.procs.b
3016 Â 3% +19.4% 3603 Â 4% vmstat.procs.r
104330 Â 1% +34.6% 140387 Â 0% vmstat.system.in
22172 Â 0% +48.3% 32877 Â 2% vmstat.system.cs
1891 Â 12% -48.2% 978 Â 10% numa-numastat.node0.other_node
1785 Â 14% -47.7% 933 Â 6% numa-numastat.node1.other_node
1790 Â 12% -47.8% 935 Â 10% numa-numastat.node2.other_node
1766 Â 14% -47.0% 935 Â 12% numa-numastat.node3.other_node
426 Â 1% -48.4% 219 Â 1% time.elapsed_time.max
426 Â 1% -48.4% 219 Â 1% time.elapsed_time
5538139 Â 0% -12.1% 4867884 Â 0% time.involuntary_context_switches
44174 Â 1% -46.0% 23848 Â 1% time.system_time
2881944 Â 2% -35.1% 1870386 Â 8% time.voluntary_context_switches
7831898 Â 4% +31.8% 10325919 Â 5% meminfo.Active
7742498 Â 4% +32.2% 10237222 Â 5% meminfo.Active(anon)
7231211 Â 4% +28.7% 9308183 Â 5% meminfo.AnonPages
7.55e+11 Â 4% +19.6% 9.032e+11 Â 8% meminfo.Committed_AS
14010 Â 1% -17.4% 11567 Â 1% meminfo.Inactive(anon)
668946 Â 4% +40.8% 941815 Â 27% meminfo.PageTables
15392 Â 1% -15.9% 12945 Â 1% meminfo.Shmem
1185 Â 0% -4.4% 1133 Â 0% turbostat.Avg_MHz
3.29 Â 6% -64.5% 1.17 Â 14% turbostat.CPU%c1
0.10 Â 12% -90.3% 0.01 Â 0% turbostat.CPU%c3
2.95 Â 3% +73.9% 5.13 Â 3% turbostat.CPU%c6
743 Â 9% -70.7% 217 Â 17% turbostat.CorWatt
300 Â 0% -9.4% 272 Â 0% turbostat.PKG_%
1.58 Â 2% +59.6% 2.53 Â 20% turbostat.Pkg%pc2
758 Â 9% -69.3% 232 Â 16% turbostat.PkgWatt
15.08 Â 0% +5.4% 15.90 Â 1% turbostat.RAMWatt
105729 Â 6% -47.0% 56005 Â 25% cpuidle.C1-IVT-4S.usage
2.535e+08 Â 12% -62.7% 94532092 Â 22% cpuidle.C1-IVT-4S.time
4.386e+08 Â 4% -79.4% 90246312 Â 23% cpuidle.C1E-IVT-4S.time
83425 Â 6% -71.7% 23571 Â 23% cpuidle.C1E-IVT-4S.usage
14237 Â 8% -79.0% 2983 Â 19% cpuidle.C3-IVT-4S.usage
1.242e+08 Â 7% -87.5% 15462238 Â 18% cpuidle.C3-IVT-4S.time
87857 Â 7% -71.1% 25355 Â 5% cpuidle.C6-IVT-4S.usage
2.359e+09 Â 2% -38.2% 1.458e+09 Â 2% cpuidle.C6-IVT-4S.time
1960460 Â 3% +31.7% 2582336 Â 4% proc-vmstat.nr_active_anon
5548 Â 2% +53.2% 8498 Â 3% proc-vmstat.nr_alloc_batch
1830492 Â 3% +28.4% 2349846 Â 3% proc-vmstat.nr_anon_pages
3514 Â 1% -17.7% 2893 Â 1% proc-vmstat.nr_inactive_anon
168712 Â 4% +40.3% 236768 Â 27% proc-vmstat.nr_page_table_pages
3859 Â 1% -16.1% 3238 Â 1% proc-vmstat.nr_shmem
1997823 Â 5% -27.4% 1450005 Â 5% proc-vmstat.numa_hint_faults
1413076 Â 6% -25.3% 1056268 Â 5% proc-vmstat.numa_hint_faults_local
7213 Â 6% -47.3% 3799 Â 7% proc-vmstat.numa_other
406056 Â 3% -41.9% 236064 Â 6% proc-vmstat.numa_pages_migrated
7242333 Â 3% -29.2% 5130788 Â 10% proc-vmstat.numa_pte_updates
406056 Â 3% -41.9% 236064 Â 6% proc-vmstat.pgmigrate_success
484141 Â 3% +32.7% 642529 Â 5% numa-vmstat.node0.nr_active_anon
1.509e+08 Â 0% -12.6% 1.319e+08 Â 3% numa-vmstat.node0.numa_hit
452041 Â 3% +29.9% 587214 Â 5% numa-vmstat.node0.nr_anon_pages
1484 Â 1% +36.5% 2026 Â 24% numa-vmstat.node0.nr_alloc_batch
1.509e+08 Â 0% -12.6% 1.319e+08 Â 3% numa-vmstat.node0.numa_local
493672 Â 8% +30.5% 644195 Â 11% numa-vmstat.node1.nr_active_anon
1481 Â 9% +52.5% 2259 Â 8% numa-vmstat.node1.nr_alloc_batch
462466 Â 8% +27.4% 589287 Â 10% numa-vmstat.node1.nr_anon_pages
485463 Â 6% +29.1% 626539 Â 4% numa-vmstat.node2.nr_active_anon
422 Â 15% -63.1% 156 Â 38% numa-vmstat.node2.nr_inactive_anon
32587 Â 9% +71.0% 55722 Â 32% numa-vmstat.node2.nr_page_table_pages
1365 Â 5% +68.7% 2303 Â 11% numa-vmstat.node2.nr_alloc_batch
453583 Â 6% +26.1% 572097 Â 4% numa-vmstat.node2.nr_anon_pages
1.378e+08 Â 2% -8.5% 1.26e+08 Â 2% numa-vmstat.node3.numa_local
441345 Â 10% +28.4% 566740 Â 6% numa-vmstat.node3.nr_anon_pages
1.378e+08 Â 2% -8.5% 1.261e+08 Â 2% numa-vmstat.node3.numa_hit
471252 Â 10% +31.9% 621440 Â 7% numa-vmstat.node3.nr_active_anon
1359 Â 4% +75.1% 2380 Â 16% numa-vmstat.node3.nr_alloc_batch
1826489 Â 0% +30.0% 2375174 Â 4% numa-meminfo.node0.AnonPages
2774145 Â 8% +26.1% 3497281 Â 9% numa-meminfo.node0.MemUsed
1962338 Â 0% +32.5% 2599292 Â 4% numa-meminfo.node0.Active(anon)
1985987 Â 0% +32.0% 2621356 Â 4% numa-meminfo.node0.Active
2768321 Â 6% +27.7% 3534224 Â 11% numa-meminfo.node1.MemUsed
1935382 Â 5% +34.2% 2597532 Â 11% numa-meminfo.node1.Active
1913696 Â 5% +34.6% 2575266 Â 11% numa-meminfo.node1.Active(anon)
1784346 Â 6% +31.7% 2349891 Â 10% numa-meminfo.node1.AnonPages
1678 Â 15% -62.7% 625 Â 39% numa-meminfo.node2.Inactive(anon)
2532834 Â 4% +27.4% 3227116 Â 8% numa-meminfo.node2.MemUsed
132885 Â 9% +67.9% 223159 Â 32% numa-meminfo.node2.PageTables
2004439 Â 5% +26.1% 2528019 Â 5% numa-meminfo.node2.Active
1856674 Â 5% +23.0% 2283461 Â 5% numa-meminfo.node2.AnonPages
1981962 Â 5% +26.4% 2505422 Â 5% numa-meminfo.node2.Active(anon)
1862203 Â 8% +33.0% 2476954 Â 6% numa-meminfo.node3.Active(anon)
1883841 Â 7% +32.6% 2498686 Â 6% numa-meminfo.node3.Active
2572461 Â 11% +24.2% 3195556 Â 8% numa-meminfo.node3.MemUsed
1739646 Â 8% +29.4% 2250696 Â 6% numa-meminfo.node3.AnonPages

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/