Re: Performance regression in scsi sequential throughput (iozone)due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLDis set"

From: Nick Piggin
Date: Mon Feb 15 2010 - 01:59:27 EST


On Fri, Feb 12, 2010 at 09:05:19PM +1100, Nick Piggin wrote:
> > This leaves me with two ideas what the real issue could be:
> > 1. either really the 6th parameter as this is the first one that needs to go on stack and that way might open a race and rob gcc a big pile of optimization chances
>
> It must be something to do wih code generation AFAIKS. I'm surprised
> the function isn't inlined, giving exactly the same result regardless
> of the patch.
>
> Unlikely to be a correctness issue with code generation, but I'm
> really surprised that a small difference in performance could have
> such a big (and apparently repeatable) effect on behaviour like this.
>
> What's the assembly look like?

Oh, and what happens if you always_inline it? I would have thought
gcc should be able to generate the same code in both cases -- unless
I'm really missing something and there is a functional change in
there?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/