Yes, you're right. I've reordered some members of struct
task_struct and I've been able to take the cost of extra processes
on the run queue from 0.2 us per process to 0.15 us on a PPro 180.
If a cache line is 32 bytes (IIRC), I've managed to take the
accesses from 4 cache lines to 2 cache lines per process.
I did all these cache tricks on the Ultra once as an experiment, a
machine where it should matter a lot. It was all lost in the noise,
so I never explored it further. I should know what I am doing here,
and it didn't matter, it didn't show up on the radar.
Later,
David S. Miller
davem@dm.cobaltmicro.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/