Re: [patch] speed up / fix the new generic semaphore code (fix AIM740% regression with 2.6.26-rc1)

From: Linus Torvalds
Date: Thu May 08 2008 - 19:14:53 EST




On Thu, 8 May 2008, Linus Torvalds wrote:
>
> Btw, sparse will complain about those, because the source code *looks*
> really cheap.

Sometimes you can fix it.

For example, this change:

- if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
+ if (pte_present(*pte) && page == pfn_to_page(pte_pfn(*pte))) {

can simplify things: instead of moving from a 'struct page' to a pfn, it
moves from a pfn to a 'struct page', and that is generally cheaper
(multiply rather than divide by size of struct page). It's not always the
same thing to do, but I think in this case we can. For me, the code
generation changes:

- movabsq $7905747460161236407, %rdx #, tmp111
- movabsq $32985348833280, %rax #, tmp107
- leaq (%r12,%rax), %rax #, tmp106
- sarq $3, %rax #, tmp106
- imulq %rdx, %rax # tmp111, tmp106
- movabsq $70368744177663, %rdx #, tmp113
- andq %rdx, %rcx # tmp113, pte$pte
- shrq $12, %rcx #, pte$pte
- cmpq %rcx, %rax # pte$pte, tmp106
+ movabsq $70368744177663, %rax #, tmp107
+ andq %rax, %rdx # tmp107, pte$pte
+ shrq $12, %rdx #, pte$pte
+ imulq $56, %rdx, %rax #, pte$pte, tmp109
+ movabsq $-32985348833280, %rdx #, tmp111
+ addq %rdx, %rax # tmp111, tmp110
+ cmpq %rax, %r13 # tmp110, page

which isn't a *huge* deal, but it certainly looks better. One less big
constant, and one less shift.

It's not going to make a huge difference, though. That function is just
called too much, and it would still be entirely data-dependent all the way
through.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/