Re: [PATCH] kvm: optimize ISR lookups

From: Thomas Gleixner
Date: Wed May 23 2012 - 18:00:36 EST

Next message: Dan Carpenter: "Re: [PATCH] staging: comedi: adl_pci7296: factor out the PCI devicecode"
Previous message: Jonathan Nieder: "Re: [3.2.16 -> 3.2.17 regression] High reported CPU load when idle"
In reply to: H. Peter Anvin: "Re: [PATCH] kvm: optimize ISR lookups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 23 May 2012, H. Peter Anvin wrote:

> On 05/23/2012 11:37 AM, Thomas Gleixner wrote:
> >>
> >> That works, but replaces one problem with another: now we have two
> >> sources for the same data, and need to juggle between them depending on
> >> register number (either synchronizing in both directions, or special
> >> casing); so you're simplifying one thing at the expense of the other.
> >> If the microcode starts accessing more registers, then having two
> >> layouts becomes even uglier.
> >
> > Fair enough :)
>
> Yes, the Âcode accessing this data structure directly probably falls
> under the category of a legitimate need to stick to the hardware format.

Thought more about that.

We have a clear distinction between HW accessed data and software
accessed data.

If I look at TPR then it is special cased already and it does:

case APIC_TASKPRI:
report_tpr_access(apic, false|true);
/* fall thru */

And the fall through is using the general accessor for all not special
cased registers.

So all you have to do is

case APIC_TASKPRI:
report_tpr_access(apic, false|true);
+ return access_mapped_reg(...);

Instead of the fall through.

So there is no synchronizing back and forth problem simply because you
already have a special case for that register.

I know you'll argue that the tpr reporting is a special hack for
windows guests, at least that's what the changelog tells.

But even if we have a few more registers accessed by hardware and if
they do not require a special casing, I really doubt that the overhead
of special casing those few regs will be worse than not having the
obvious optimization in place.

And looking deeper it's a total non issue. The apic mapping is 4k. The
register stride is strictly 0x10. That makes a total of 256 possible
registers.

So now you have two possibilites:

1) Create a 256 bit == 64byte bitfield to select the one or the other
representation.

The overhead of checking the bit is not going to be observable.

2) Create a 256 function pointer array == 2k resp. 1k (64 / 32bit)

That's not a lot of memory even if you have to maintain two
separate variants for read and write, but it allows you to get rid
of the already horribly compiled switch case in apic_read/write and
you'll get the optional stuff like report_tpr_access() w/o extra
conditionals just for free.

An extra goodie is that you can catch any access to a non existing
register which you now just silently ignore. And that allows you
to react on any future hardware oddities without adding a single
runtime conditional.

This is stricly x86 and x86 is way better at dealing with indirect
calls than with the mess gcc creates compiling those switch case
constructs.

So I'd go for that and rather spend the memory and the time in
setting up the function pointers on init/ioctl than dealing with
the inconsistency of HW/SW representation with magic hacks.

Thanks,

tglx

Next message: Dan Carpenter: "Re: [PATCH] staging: comedi: adl_pci7296: factor out the PCI devicecode"
Previous message: Jonathan Nieder: "Re: [3.2.16 -> 3.2.17 regression] High reported CPU load when idle"
In reply to: H. Peter Anvin: "Re: [PATCH] kvm: optimize ISR lookups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]