[ ... ]
00122690 sync_buffers 13365 33.7500 2.59
00137650 dev_tint 15681 105.9527 3.04
001376e4 dev_ifconf 16710 72.0259 3.24
0011b534 filemap_nopage 18389 26.8845 3.56
001a0250 vortex_rx 18855 36.2596 3.65
0011af64 generic_file_read 20578 13.8293 3.98
00118be8 do_no_page 20953 25.0634 4.06
00195fc8 stli_write 23428 24.7131 4.54
00137518 dev_transmit 31039 705.4318 6.01
0014bca0 ip_chk_addr 58691 232.9008 11.36
(last column is percentage of total kernel CPU this function is using).
get_empty_filp() is now down to 0.17% of total kernel CPU, down from
16%. So for my usage patterns at least, it's now approx 100 times
faster. Anyone else running a profiled kernel in a production
environment to compare against?
One of the questions that keeps popping up is: Why is dev_transmit
using so much CPU time? as far as I can see, it only get called from
net_bh(), and net_bh is only using 0.38%. dev_transmit is basically a
loop around dev_tint(), and even dev_tint isn't using that much, even
tho it's a much more complex function. Very very odd. Anyone got any
ideas on why it's using so much? Note that on the numbers above,
dev_transmit() is using 1.6% of wall clock time!
Also, has anyone ever managed to compile a call tracer into the
kernel? i.e. something like the equivilant of gprof()? Doesn't look
too hard as far as I can see, but I don't know how gcc -p would handle
in-lined assembly (and vice versa).
(ps. be aware that this machine has (currently) 214 network
interfaces. which is why ip_chk_addr is so large)