Re: [patch] speed up / fix the new generic semaphore code (fixAIM7 40% regression with 2.6.26-rc1)

From: Ingo Molnar
Date: Thu May 08 2008 - 18:03:09 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> yeah, i captured a trace of all the down()s that happen in the
> workload, using ftrace/sched_switch + stacktrace + a tracepoint in
> down(). Here's the semaphore activities in the trace:

i've updated the trace with a better one:

> http://redhat.com/~mingo/misc/aim7-ftrace.txt

the first one had some idle time in it as well plus the effects of a
2000-task Ctrl-Z - which skewed the histograms.

the new stats are more straightforward. down() callsite histogram:

42 down <= lock_kernel <= de_put <= proc_delete_inode <
42 down <= lock_kernel <= proc_lookup_de <= proc_lookup <
78 down <= lock_kernel <= proc_lookup_de <= proc_lookup <
310 down <= lock_kernel <= sys_fcntl <= system_call_after_swapgs <
332 down <= lock_kernel <= vfs_ioctl <= do_vfs_ioctl <
380 down <= lock_kernel <= tty_release <= __fput <
422 down <= lock_kernel <= chrdev_open <= __dentry_open <

hm, why is chrdev_open() called all that often?

top-5 wakeups:

4 default_wake_function <= __wake_up_common <= complete <= migration_thread <= kthread <= child_rip <
4 wake_up_process <= sched_exec <= do_execve <= sys_execve <= stub_execve <= <
8 wake_up_process <= __mutex_unlock_slowpath <= mutex_unlock <= do_filp_open <= do_sys_open <= sys_open <
40 default_wake_function <= autoremove_wake_function <= __wake_up_common <= __wake_up <= sock_def_wakeup <= tcp_rcv_state_process <
98 default_wake_function <= __wake_up_common <= __wake_up_sync <= do_notify_parent <= do_exit <= do_group_exit <

i.e. very little wakeup activities. Top 10 scheduling points:

10 schedule <= kjournald <= kthread <= child_rip <
12 schedule <= __down_write_nested <= __down_write <= down_write <
29 schedule <= worker_thread <= kthread <= child_rip <
40 schedule <= schedule_timeout <= inet_stream_connect <= sys_connect <
59 schedule <= __cond_resched <= _cond_resched <= generic_file_buffered_write <
111 schedule <= ksoftirqd <= kthread <= child_rip <
119 schedule <= do_wait <= sys_wait4 <= system_call_after_swapgs <
659 schedule <= do_exit <= do_group_exit <= sys_exit_group <
781 schedule <= sysret_careful <= <= 0 <
1347 schedule <= retint_careful <= <= 0 <

> and here's a few seconds worth of NMI driven readprofile output:

the NMI profiling results were accurate.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/