I bet qemu doesn't have a real descriptor cache unlike real CPUs.
So likely it is some disconnect between changing the backing GDT
and referencing the register. Reload %gs more aggressively?
Comparing with SimNow! (which should behave more like a real CPU)
might be also interesting.
- Measure performance impact. The patch adds a segment register
save/restore on entry/exit to the kernel. This expense should be
offset by savings in using the PDA while in the kernel, but I haven't
measured this yet. Space savings are already appealing though.
- Modify more things to use the PDA. The more that uses it, the more
the cost of the %gs save/restore is amortized. smp_processor_id and
current are the obvious first choices, which are implemented in this
series.
per cpu data would be the prime candidate. It is pretty simple.
- Make it a config option? UP systems don't need to do any of this,
other than having a single pre-allocated PDA. Unfortunately, it gets
a bit messy to do this given the changes needed in handling %gs.
Please don't.