When I did a strict replacement I found ~10% worse memory population
performance.
Running dirty_log_perf_test -v 96 -b 3g -i 5 with the TDP MMU
disabled, I got 119 sec to populate memory as the baseline and 134 sec
with an earlier version of this series which just replaced the
spinlock with an rwlock. I believe this difference is statistically
significant, but didn't run multiple trials.
I didn't take notes when profiling, but I'm pretty sure the rwlock
slowpath showed up a lot. This was a very high contention scenario, so
it's probably not indicative of real-world performance.
In the slow path, the rwlock is certainly slower than a spin lock.
If the real impact doesn't seem too large, I'd be very happy to just
replace the spinlock.