kernel compiling performance challenge

From: Nick Piggin
Date: Fri Oct 07 2005 - 05:45:22 EST


I was recently disappointed to find that 2.4 kernels very slightly
edge out the latest 2.6 kernels in kernel compiling performance on
my dual Xeon, even when 2.6 has HZ set at 100.

So I spent a few days trying some different mm/ optimisations, and
managed to reduce total kernel residency (excluding idle time) by 7%
on that workload, and now manage to beat 2.4 by a good third of a
second.

Here's a diffprofile versus a plain 2.6.14-rc3 kernel:
123 384.4% __get_zone_counts
54 0.0% __page_set_anon_rmap
37 17.0% find_lock_page
28 31.8% lru_cache_add_active
26 42.6% path_lookup
25 22.3% kmem_cache_alloc
20 6.2% find_vma
19 17.3% _atomic_dec_and_lock
18 26.5% __copy_from_user_ll
17 188.9% shmem_nopage
17 58.6% unmap_vmas
15 0.0% __page_state
14 31.1% copy_pte_range
14 13.6% __wake_up_bit
14 0.0% remove_vma
14 46.7% exit_notify
13 61.9% sys_close
13 27.1% anon_vma_prepare
13 0.0% unlink_file_vma
13 92.9% do_generic_mapping_read
12 109.1% free_pgd_range
12 0.0% vm_stat_account
12 100.0% sys_mmap2
10 17.5% get_empty_filp
.
.
-10 -21.7% _spin_unlock_irq
-10 -90.9% flush_old_exec
-10 -25.6% vfs_read
-10 -4.6% __handle_mm_fault
-12 -17.1% do_shmem_file_read
-12 -60.0% cond_resched
-12 -60.0% number
-12 -46.2% sys_open
-12 -100.0% __vm_stat_account
-13 -46.4% inotify_inode_queue_event
-13 -33.3% dput
-13 -2.8% release_pages
-14 -26.9% kmem_cache_free
-14 -17.7% pte_alloc_map
-14 -3.2% __link_path_walk
-16 -100.0% __rmqueue
-19 -10.9% sysenter_past_esp
-19 -55.9% vfs_getattr
-20 -25.0% zone_watermark_ok
-21 -17.9% may_open
-21 -15.8% strnlen_user
-21 -8.8% __pagevec_lru_add_active
-23 -100.0% remove_vm_struct
-23 -5.1% zap_pte_range
-28 -0.6% do_page_fault
-33 -15.8% do_anonymous_page
-40 -7.1% __d_lookup
-44 -5.9% _spin_lock
-75 -43.1% page_remove_rmap
-79 -98.8% set_page_dirty
-81 -23.5% free_hot_cold_page
-81 -10.0% __copy_to_user_ll
-94 -98.9% page_add_anon_rmap
-108 -26.2% do_no_page
-111 -72.1% prep_new_page
-143 -100.0% bad_range
-444 -87.2% __mod_page_state
-548 -10.1% buffered_rmqueue
-1634 -7.1% total

Depending on how much interest there is, I might keep a tree around
to collect performance improvements. If you have any more[*] I could
look at, please send them over. I'll eventually try to get things
merged.

* Not just for kbuild, or only mm related, but preferably something
that I can easily measure on my little system.

Attached is a rollup against 2.6.14-rc3. I don't currently have any
webspace handy, so I can't host a broken-out tarball anywhere yet.

Sorry for the big attachment (actually most of it is Hugh's pagefault
scalability prep and my lockless pagecache prep that I'm working on
top of).

Nick

--
SUSE Labs, Novell Inc.

Attachment: 2.6.14-rc3-kc1.patch.gz
Description: Unix tar archive