Don't guess; measure.
$ time ./a.out | dd bs=1048576 count=10000 of=/dev/null
0+10000 records in
0+10000 records out
Broken pipe
real 0m4.010s
user 0m3.560s
sys 0m0.240s
Your program generates roughly 2.5 GB/s on an unloaded Celeron 333.
(gcc 2.7.2.3, -O2) This exceeds the RAM bandwidth.
There is no need to optimize the generator speed further.
-- Shields.- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/