Re: Performance regression in write() syscall
From: Ingo Molnar
Date: Tue Feb 24 2009 - 11:52:19 EST
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > No, but I think it should be in arch code, and the
> > "_nocache" suffix should just be a hint to the architecture
> > that the destination is not so likely to be used.
>
> Yes. Especially since arch code is likely to need various
> arch-specific checks anyway (like the x86 code does about
> aligning the destination).
I'm inclined to do this in two or three phases: first apply the
fix from Salman in the form below.
In practice if we get a 4K copy request the likelyhood is large
that this is for a larger write and for a full pagecache page.
The target is very unlikely to be misaligned, the source might
be as it comes from user-space.
This portion of __copy_user_nocache() becomes largely
unnecessary:
ENTRY(__copy_user_nocache)
CFI_STARTPROC
cmpl $8,%edx
jb 20f /* less then 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
movl %edx,%ecx
andl $63,%edx
shrl $6,%ecx
jz 17f
And the tail portion becomes unnecessary too. Those are over a
dozen instructions so probably worth optimizing out.
But i'd rather express this in terms of a separate
__copy_user_page_nocache function and keep the generic
implementation too.
I.e. like the second patch below. (not tested)
With this __copy_user_nocache() becomes unused - and once we are
happy with the performance characteristics of 4K non-temporal
copies, we can remove this more generic implementation.
Does this sound reasonable, or do you think we can be smarter
than this?
Ingo
-------------------->