Re: [PATCH v7 1/4] spinlock: A new lockref structure for locklessupdate of refcount

From: Waiman Long
Date: Tue Sep 03 2013 - 11:15:19 EST


On 09/03/2013 02:01 AM, Ingo Molnar wrote:
* Waiman Long<waiman.long@xxxxxx> wrote:

Yes, that patch worked. It eliminated the lglock as a bottleneck in the AIM7 workload. The lg_global_lock did not show up in the perf profile, whereas the lg_local_lock was only 0.07%.
Just curious: what's the worst bottleneck now in the optimized kernel? :-)

Thanks,

Ingo
With the following patches on v3.11:
1. Linus's version of lockref patch
2. Al's lglock patch
3. My preliminary patch to convert prepend_path under RCU

The perf profile of the kernel portion of the short workload in a 80-core system became like this:

29.87% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|--50.00%-- tty_ldisc_deref
|--49.01%-- tty_ldisc_try
--0.99%-- [...]

7.55% swapper [kernel.kallsyms] [k] intel_idle
1.03% reaim [kernel.kallsyms] [k] copy_user_generic_string
0.91% reaim [kernel.kallsyms] [k] _raw_spin_lock
|--15.88%-- __rcu_process_callbacks
|--6.55%-- load_balance
|--6.02%-- sem_lock
|--4.77%-- enqueue_to_backlog
|--4.21%-- task_rq_lock
|--3.97%-- process_backlog
|--3.35%-- unix_dgram_sendmsg
|--3.28%-- kmem_cache_free
|--3.16%-- tcp_v4_rcv
|--2.77%-- unix_stream_sendmsg
|--2.36%-- rcu_accelerate_cbs
|--2.02%-- do_wp_page
|--2.02%-- unix_create1
|--1.83%-- unix_peer_get
|--1.67%-- udp_lib_get_port
|--1.66%-- unix_stream_recvmsg
|--1.63%-- handle_pte_fault
|--1.63%-- udp_queue_rcv_skb
|--1.54%-- unix_release_sock
|--1.48%-- try_to_wake_up
|--1.37%-- do_anonymous_page
|--1.37%-- new_inode_pseudo
|--1.33%-- __d_lookup
|--1.20%-- free_one_page
|--1.11%-- __do_fault
|--1.06%-- scheduler_tick
|--0.90%-- __drain_alien_cache
|--0.81%-- inet_csk_get_port
|--0.76%-- sock_alloc
|--0.76%-- shmem_lock
|--0.75%-- __d_instantiate
|--0.70%-- __inet_hash_connect
|--0.69%-- __inet_hash_nolisten
|--0.68%-- ip_local_deliver_finish
|--0.64%-- inet_hash
|--0.64%-- kfree
|--0.60%-- d_path
|--0.58%-- __close_fd
|--0.51%-- evict
--11.76%-- [...]

0.51% reaim [ip_tables] [k] ipt_do_table
0.46% reaim [kernel.kallsyms] [k] __alloc_skb
0.38% reaim [kernel.kallsyms] [k] kfree
0.36% reaim [kernel.kallsyms] [k] kmem_cache_free
0.34% reaim [kernel.kallsyms] [k] system_call_after_swapg
0.32% reaim [kernel.kallsyms] [k] fsnotify
0.32% reaim [kernel.kallsyms] [k] ip_finish_output
0.27% reaim [kernel.kallsyms] [k] system_call

Other than the global tty_ldisc_lock, there is no other major
bottleneck. I am not that worry about the tty_ldisc_lock bottleneck
as real world applications probably won't have that many calls to
set the tty driver.

Regards,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/