Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

From: Neil Horman
Date: Tue Oct 29 2013 - 07:20:48 EST

On Tue, Oct 29, 2013 at 09:25:42AM +0100, Ingo Molnar wrote:
> * Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
> > Heres my data for running the same test with taskset restricting
> > execution to only cpu0. I'm not quite sure whats going on here,
> > but doing so resulted in a 10x slowdown of the runtime of each
> > iteration which I can't explain. As before however, both the
> > parallel alu run and the prefetch run resulted in speedups, but
> > the two together were not in any way addative. I'm going to keep
> > playing with the prefetch stride, unless you have an alternate
> > theory.
> Could you please cite the exact command-line you used for running
> the test?
> Thanks,
> Ingo

Sure it was this:
for i in `seq 0 1 3`
echo $i > /sys/module/csum_test/parameters/module_test_mode
taskset -c 0 perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/
done >> counters.txt 2>&1

where is:
echo 1 > /sys/module/csum_test/parameters/test_fire

As before, module_test_mode selects a case in a switch statement I added in
do_csum to test one of the 4 csum variants we've been discusing (base, prefetch,
parallel ALU or both), and test_fire is a callback trigger I use in the test
module to run 100000 iterations of a checksum operation. As you requested, I
ran the above on cpu 0 (-C 0 on perf and -c 0 on taskset), and I removed all irq
affinity to cpu 0.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at