Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmapsto user space

From: Avi Kivity
Date: Mon May 10 2010 - 08:07:18 EST


On 05/04/2010 03:56 PM, Takuya Yoshikawa wrote:
[Performance test]

We measured the tsc needed to the ioctl()s for getting dirty logs in
kernel.

Test environment

AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory


1. GUI test (running Ubuntu guest in graphical mode)

sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...

We show a relatively stable part to compare how much time is needed
for the basic parts of dirty log ioctl.

get.org get.opt switch.opt

slots[7].len=32768 278379 66398 64024
slots[8].len=32768 181246 270 160
slots[7].len=32768 263961 64673 64494
slots[8].len=32768 181655 265 160
slots[7].len=32768 263736 64701 64610
slots[8].len=32768 182785 267 160
slots[7].len=32768 260925 65360 65042
slots[8].len=32768 182579 264 160
slots[7].len=32768 267823 65915 65682
slots[8].len=32768 186350 271 160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.

100 ns... this is a bit on the low side (and if you can measure it interactively you have much better reflexes than I).

To feel the difference, please try GUI on your PC with our patch series!

No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is problematic. Have you tried profiling to see where the time is spent (well I can guess, clearing the write access from the sptes).


2. Live-migration test (4GB guest, write loop with 1GB buf)

We also did a live-migration test.

get.org get.opt switch.opt

slots[0].len=655360 797383 261144 222181
slots[1].len=3757047808 2186721 1965244 1842824
slots[2].len=637534208 1433562 1012723 1031213
slots[3].len=131072 216858 331 331
slots[4].len=131072 121635 225 164
slots[5].len=131072 120863 356 164
slots[6].len=16777216 121746 1133 156
slots[7].len=32768 120415 230 278
slots[8].len=32768 120368 216 149
slots[0].len=655360 806497 194710 223582
slots[1].len=3757047808 2142922 1878025 1895369
slots[2].len=637534208 1386512 1021309 1000345
slots[3].len=131072 221118 459 296
slots[4].len=131072 121516 272 166
slots[5].len=131072 122652 244 173
slots[6].len=16777216 123226 99185 149
slots[7].len=32768 121803 457 505
slots[8].len=32768 121586 216 155
slots[0].len=655360 766113 211317 213179
slots[1].len=3757047808 2155662 1974790 1842361
slots[2].len=637534208 1481411 1020004 1031352
slots[3].len=131072 223100 351 295
slots[4].len=131072 122982 436 164
slots[5].len=131072 122100 300 503
slots[6].len=16777216 123653 779 151
slots[7].len=32768 122617 284 157
slots[8].len=32768 122737 253 149

For slots other than 0,1,2 we can see the similar improvement.

Considering the fact that switch.opt does not depend on the bitmap length
except for kvm_mmu_slot_remove_write_access(), this is the cause of some
usec to msec time consumption: there might be some context switches.

But note that this was done with the workload which dirtied the memory
endlessly during the live-migration.

In usual workload, the number of dirty pages varies a lot for each iteration
and we should gain really a lot for relatively clean cases.

Can you post such a test, for an idle large guest?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/