Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

From: Ingo Molnar
Date: Sat Oct 12 2013 - 13:21:36 EST



* Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:

> Sébastien Dugué reported to me that devices implementing ipoib (which
> don't have checksum offload hardware were spending a significant amount
> of time computing checksums. We found that by splitting the checksum
> computation into two separate streams, each skipping successive elements
> of the buffer being summed, we could parallelize the checksum operation
> accros multiple alus. Since neither chain is dependent on the result of
> the other, we get a speedup in execution (on hardware that has multiple
> alu's available, which is almost ubiquitous on x86), and only a
> negligible decrease on hardware that has only a single alu (an extra
> addition is introduced). Since addition in commutative, the result is
> the same, only faster

This patch should really come with measurement numbers: what performance
increase (and drop) did you get on what CPUs.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/