Re: [PATCH] ALSA: pcm: Convert snd_pcm_ioctl_sync_ptr_{compat/x32} to user_access_begin/user_access_end()
From: Christophe Leroy
Date: Fri Jun 13 2025 - 01:50:47 EST
Le 12/06/2025 à 13:02, Takashi Iwai a écrit :
On Thu, 12 Jun 2025 12:39:39 +0200,
Christophe Leroy wrote:
With user access protection (Called SMAP on x86 or KUAP on powerpc)
each and every call to get_user() or put_user() performs heavy
operations to unlock and lock kernel access to userspace.
SNDRV_PCM_IOCTL_SYNC_PTR ioctl is a hot path that needs to be
optimised. To do that, perform user accesses by blocks using
user_access_begin/user_access_end() and unsafe_get_user()/
unsafe_put_user() and alike.
Before the patch the 9 calls to put_user() at the
end of snd_pcm_ioctl_sync_ptr_compat() imply the following set of
instructions about 9 times (access_ok - enable user - write - disable
user):
0.00 : c057f858: 3d 20 7f ff lis r9,32767
0.29 : c057f85c: 39 5e 00 14 addi r10,r30,20
0.77 : c057f860: 61 29 ff fc ori r9,r9,65532
0.32 : c057f864: 7c 0a 48 40 cmplw r10,r9
0.36 : c057f868: 41 a1 fb 58 bgt c057f3c0 <snd_pcm_ioctl+0xbb0>
0.30 : c057f86c: 3d 20 dc 00 lis r9,-9216
1.95 : c057f870: 7d 3a c3 a6 mtspr 794,r9
0.33 : c057f874: 92 8a 00 00 stw r20,0(r10)
0.27 : c057f878: 3d 20 de 00 lis r9,-8704
0.28 : c057f87c: 7d 3a c3 a6 mtspr 794,r9
...
A perf profile shows that in total the 9 put_user() represent 36% of
the time spent in snd_pcm_ioctl() and about 80 instructions.
With this patch everything is done in 13 instructions and represent
only 15% of the time spent in snd_pcm_ioctl():
0.57 : c057f5dc: 3d 20 dc 00 lis r9,-9216
0.98 : c057f5e0: 7d 3a c3 a6 mtspr 794,r9
0.16 : c057f5e4: 92 7f 00 04 stw r19,4(r31)
0.63 : c057f5e8: 93 df 00 0c stw r30,12(r31)
0.16 : c057f5ec: 93 9f 00 10 stw r28,16(r31)
4.95 : c057f5f0: 92 9f 00 14 stw r20,20(r31)
0.19 : c057f5f4: 92 5f 00 18 stw r18,24(r31)
0.49 : c057f5f8: 92 bf 00 1c stw r21,28(r31)
0.27 : c057f5fc: 93 7f 00 20 stw r27,32(r31)
5.88 : c057f600: 93 36 00 00 stw r25,0(r22)
0.11 : c057f604: 93 17 00 00 stw r24,0(r23)
0.00 : c057f608: 3d 20 de 00 lis r9,-8704
0.79 : c057f60c: 7d 3a c3 a6 mtspr 794,r9
Note that here the access_ok() in user_write_access_begin() is skipped
because the exact same verification has already been performed at the
beginning of the fonction with the call to user_read_access_begin().
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
---
This is a lighter version of previous patch "[PATCH v2] ALSA: pcm: Convert multiple {get/put}_user to user_access_begin/user_access_end()" focussing on identified hot path.
Moved and nested the failure labels closer in order to increase readability
Thanks for the revised patch!
Although it's now much lighter, I still believe that we can reduce
get_user() / put_user() calls significantly by adjusting the struct
usage.
Could you check whether the patch below can improve?
(Zero-ing of sstatus can be an overhead here, but there are some
holes, and these need to be initialized before copying back...)
Thanks for the proposed patch. Unfortunately it doesn't improve the
situation. The problem with copy_from_user() and copy_to_user() is that
they perform quite bad on small regions. And for the from_user side we
still get two user access enable/disable instead 3 and for the to_user
side we still get two as well (Allthough we had 7 previously). Those 4
user access enable/disable still have a cost.
Nowadays the tendency is really to go for the unsafe_put/get_user style,
see some significant exemples below. And as mentioned in those commits,
behind the performance improvement it also lead to much cleaner code
generation.
- 38ebcf5096a8 ("scm: optimize put_cmsg()")
- 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()")
- ef0ba0553829 ("poll: fix performance regression due to out-of-line
__put_user()")
Commit 38ebcf5096a8 is explicit about copy_to_user() being bad for small
regions.
Here below is some comparison between the three way of doing
snd_pcm_ioctl_sync_ptr_compat(), output is from 'perf top':
Initially I got the following. That 12%+ is the reason why I started
investigating.
14.20% test_perf [.] engine_main
==> 12.86% [kernel] [k] snd_pcm_ioctl
11.91% [kernel] [k] finish_task_switch.isra.0
4.15% [kernel] [k] snd_pcm_group_unlock_irq.part.0
4.07% libc.so.6 [.] __ioctl_time64
3.58% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
3.37% [kernel] [k] sys_ioctl
2.96% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
2.73% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
2.58% [kernel] [k] system_call_exception
1.93% libasound.so.2.0.0 [.] sync_ptr1
1.85% libasound.so.2.0.0 [.] snd_pcm_unlock
1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
1.83% libasound.so.2.0.0 [.] bad_pcm_state
1.68% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.67% libasound.so.2.0.0 [.] snd_pcm_avail_update
With _your_ patch I get the following. copy_from_user() calls
_copy_from_user() and copy_to_user() calls _copy_to_user(). Both then
call __copy_tofrom_user(). In total it is 16.4% so it is worse than before.
14.47% test_perf [.] engine_main
12.00% [kernel] [k] finish_task_switch.isra.0
==> 8.37% [kernel] [k] snd_pcm_ioctl
5.44% libc.so.6 [.] __ioctl_time64
5.03% [kernel] [k] snd_pcm_group_unlock_irq.part.0
==> 4.86% [kernel] [k] __copy_tofrom_user
4.62% [kernel] [k] sys_ioctl
3.22% [kernel] [k] system_call_exception
2.42% libasound.so.2.0.0 [.] snd_pcm_mmap_begin
2.31% [kernel] [k] fdget
2.23% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
2.19% [kernel] [k] syscall_exit_prepare
1.92% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.86% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
1.68% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
==> 1.67% [kernel] [k] _copy_from_user
1.66% libasound.so.2.0.0 [.] bad_pcm_state
==> 1.53% [kernel] [k] _copy_to_user
1.40% libasound.so.2.0.0 [.] sync_ptr1
With my patch I get the following:
17.46% test_perf [.] engine_main
9.14% [kernel] [k] finish_task_switch.isra.0
==> 4.92% [kernel] [k] snd_pcm_ioctl
3.99% [kernel] [k] snd_pcm_group_unlock_irq.part.0
3.71% libc.so.6 [.] __ioctl_time64
3.61% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin_generic
2.72% libasound.so.2.0.0 [.] sync_ptr1
2.65% [kernel] [k] system_call_exception
2.46% [kernel] [k] sys_ioctl
2.43% [kernel] [k] __rseq_handle_notify_resume
2.34% [kernel] [k] do_epoll_wait
2.30% libasound.so.2.0.0 [.] __snd_pcm_mmap_commit
2.14% libasound.so.2.0.0 [.] __snd_pcm_avail
2.04% libasound.so.2.0.0 [.] snd_pcm_hw_avail_update
1.89% libasound.so.2.0.0 [.] snd_pcm_lock
1.84% libasound.so.2.0.0 [.] snd_pcm_mmap_avail
1.76% libasound.so.2.0.0 [.] __snd_pcm_avail_update
1.61% libasound.so.2.0.0 [.] bad_pcm_state
1.60% libasound.so.2.0.0 [.] __snd_pcm_mmap_begin
1.49% libasound.so.2.0.0 [.] query_status_data
Christophe