RE: [RFC PATCH] [X86/mem] Handle unaligned case by avoiding storecrossing cache line

From: Ma, Ling
Date: Tue Oct 12 2010 - 10:06:25 EST


>> * Use 32bit CMP here to avoid long NOP padding.
>> */
>> - cmp $0x20, %edx
>> + cmp $0x28, %rdx
>
>Well, look above your change. The comment says "Use 32bit CMP".
>If you really want to go to 64-bit one, then change comment too.

Yes, change next version. In this version we use 64bit is for Lcopy_forward_loop
become 16bit aligned without Nop padding.

>> + /*
>> + * We append data to avoid store crossing cache.
>> + */
>> + movq (%rsi), %rcx
>> + movq %rdi, %r8
>> + addq $8, %rdi
>> + andq $-8, %rdi
>> + movq %rcx, (%r8)
>> + subq %rdi, %r8
>> + addq %r8, %rdx
>> + subq %r8, %rsi

>The comment doesn't really help to understand what you are doing here.
>Maybe "Align store location to 32 bytes to avoid crossing cachelines"?

Ok change in next version. Here by overlap writing we force store to handle data
with 8bytes aligned later. If store can handle memory with aligned 8bytes,
it never cross cache line in one operation because cache line size should be
multiple of 8bytes.

Thanks
Ling

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/