How to measure enable_kernel_fpu overhead?

From: George Spelvin
Date: Fri Jun 03 2011 - 13:26:59 EST


I'm working on some crypto primitives, and have MMX and SSE2 accelerated
versions. I plan on writing AltiVec (PPC) and NEON (ARM) ones, too.

But the performance varies, so I think I need to do some run-time
benchmarking, like the RAID6 code.

But what's even more annoying is that unlike the RAID6 code, I can't
assume I'll alawys be working on large blocks. So it's not so much about
choosing one version as choosing a size threshold at which to switch over.

Which leads us into the overhead of enable_kernel_fpu(),
enable_kernel_altivec(), and whatever the ARM equivalent is.
(Um, did a little searching... does it even exist?)

To complicate it a little more, there are at two timing cases, depending
on the value of current_thread_info()->status & TS_USEDFPU. (For PowerPC,
it's current->thread.regs->msr & MSR_VEC.)

There may be additional timing variation depending on how clever XSAVE
is with e.g. the high half of the ymm registers.


So I think the thing to do is benchmark a few different sizes in the two
timing cases and fit a linear function to the results. (More simply,
subtract the timings from the integer code and find the X-intercept of
that function.)

But I'm not sure how to create, a suitably dirty user FPU state.
Especially as it might be early boot and there might not be any user
processes yet to save it to.

Does anyone have any suggestions?

Thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/