Re: [RFC PATCH 00/17] virtual-bus

From: Avi Kivity
Date: Sun Apr 05 2009 - 04:06:35 EST


Rusty Russell wrote:
On Thursday 02 April 2009 02:40:29 Anthony Liguori wrote:
Rusty Russell wrote:
As you point out, 350-450 is possible, which is still bad, and it's at least
partially caused by the exit to userspace and two system calls. If virtio_net
had a backend in the kernel, we'd be able to compare numbers properly.
I doubt the userspace exit is the problem. On a modern system, it takes about 1us to do a light-weight exit and about 2us to do a heavy-weight exit. A transition to userspace is only about ~150ns, the bulk of the additional heavy-weight exit cost is from vcpu_put() within KVM.

Just to inject some facts, servicing a ping via tap (ie host->guest then
guest->host response) takes 26 system calls from one qemu thread, 7 from
another (see strace below). Judging by those futex calls, multiple context
switches, too.

Interesting stuff. Even if amortized over half a ring's worth of packets, that's quite a lot.

Two threads are involved (we complete on the iothread, since we don't know which vcpu will end up processing the interrupt, if any).


Pid 10260:
12:37:40.245785 select(17, [4 6 8 14 16], [], [], {0, 996000}) = 1 (in [6], left {0, 992000}) <0.003995>

Should switch to epoll with its lower wait costs. Unfortunately the relative timeout requires reading the clock.

12:37:40.250226 read(6, "\0\0\0\0\0\0\0\0\0\0RT\0\0224V*\211\24\210`\304\10\0E\0"..., 69632) = 108 <0.000051>
12:37:40.250462 write(1, "tap read: 108 bytes\n", 20) = 20 <0.000197>

I hope this is your addition.

12:37:40.250800 ioctl(7, 0x4008ae61, 0x7fff8cafb3a0) = 0 <0.000223>
12:37:40.251149 read(6, 0x115c6ac, 69632) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>

This wouldn't be necessary with io_getevents().

12:37:40.251292 write(1, "tap read: -1 bytes\n", 19) = 19 <0.000085>

...

12:37:40.251488 clock_gettime(CLOCK_MONOTONIC, {1554, 633304282}) = 0 <0.000020>
12:37:40.251604 clock_gettime(CLOCK_MONOTONIC, {1554, 633413793}) = 0 <0.000019>

Great.

12:37:40.251717 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 1 <0.001222>
12:37:40.253037 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [16], left {1, 0}) <0.000026>
12:37:40.253196 read(16, "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128 <0.000022>
12:37:40.253324 rt_sigaction(SIGALRM, NULL, {0x406d50, ~[KILL STOP RTMIN RT_1], SA_RESTORER, 0x7f1a842430f0}, 8) = 0 <0.000018>
12:37:40.253477 write(5, "\0", 1) = 1 <0.000022>

The write is to wake someone up. Who?

12:37:40.253585 read(16, 0x7fff8cb09440, 128) = -1 EAGAIN (Resource temporarily unavailable) <0.000020>

Clearing up signalfd...

12:37:40.253687 clock_gettime(CLOCK_MONOTONIC, {1554, 635496181}) = 0 <0.000019>
12:37:40.253798 writev(6, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"*\211\24\210`\304RT\0\0224V\10\0E\0\0T\255\262\0\0@\1G"..., 98}], 2) = 108 <0.000062>
12:37:40.253993 ioctl(7, 0x4008ae61, 0x7fff8caff460) = 0 <0.000161>

Injecting the interrupt.

12:37:40.254263 clock_gettime(CLOCK_MONOTONIC, {1554, 636077540}) = 0 <0.000019>
12:37:40.254380 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 1 <0.000394>
12:37:40.254861 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [4], left {1, 0}) <0.000022>
12:37:40.255001 read(4, "\0", 512) = 1 <0.000021>

Great, the write() was to wake ourselves up.

12:37:40.255109 read(4, 0x7fff8cb092d0, 512) = -1 EAGAIN (Resource temporarily unavailable) <0.000018>
12:37:40.255211 clock_gettime(CLOCK_MONOTONIC, {1554, 637020677}) = 0 <0.000019>
12:37:40.255314 clock_gettime(CLOCK_MONOTONIC, {1554, 637123483}) = 0 <0.000019>
12:37:40.255416 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 <0.000018>
12:37:40.255524 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 14000000}}, NULL) = 0 <0.000021>
12:37:40.255635 clock_gettime(CLOCK_MONOTONIC, {1554, 637443915}) = 0 <0.000019>
12:37:40.255739 clock_gettime(CLOCK_MONOTONIC, {1554, 637547001}) = 0 <0.000018>
12:37:40.255847 select(17, [4 6 8 14 16], [], [], {1, 0}) = 1 (in [16], left {0, 988000}) <0.014303>


This is the vcpu thread:

Pid 10262:
12:37:40.252531 clock_gettime(CLOCK_MONOTONIC, {1554, 634339051}) = 0 <0.000018>
12:37:40.252631 timer_gettime(0, {it_interval={0, 0}, it_value={0, 17549811}}) = 0 <0.000021>
12:37:40.252750 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0 <0.000024>
12:37:40.252868 ioctl(11, 0xae80, 0) = 0 <0.001171>
12:37:40.254128 futex(0xb81360, 0x80 /* FUTEX_??? */, 2) = 0 <0.000270>
12:37:40.254490 ioctl(7, 0x4008ae61, 0x4134bee0) = 0 <0.000019>
12:37:40.254598 futex(0xb81360, 0x81 /* FUTEX_??? */, 1) = 0 <0.000017>
12:37:40.254693 ioctl(11, 0xae80 <unfinished ...>

Looks like the interrupt from the iothread was injected and delivered before the iothread could give up the mutex, so we needed to wait here.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/