Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels

From: Linus Torvalds
Date: Thu Jun 04 2009 - 11:04:20 EST




On Thu, 4 Jun 2009, Rusty Russell wrote:
> >
> > Turn off HIGHMEM64G, please (and HIGHMEM4G too, for that matter - you
> > can't compare it to a no-highmem case).
>
> Thanks, your point is demonstrated below. I don't think HIGHMEM4G is
> unreasonable for a distro tho, so I turned that on instead.

Well, I agree that HIGHMEM4G is a _reasonable_ thing to turn on.

The thing I disagree with is that it's at all valid to then compare to
some all-software feature thing. HIGHMEM doesn't expand any esoteric
capability that some people might use - it's about regular RAM for regular
users.

And don't get me wrong - I don't like HIGHMEM. I detest the damn thing. I
hated having to merge it, and I still hate it. It's a stupid, ugly, and
very invasive config option. It's just that it's there to support a
stupid, ugly and very annoying fundamental hardware problem.

So I think your minimum and maximum configs should at least _match_ in
HIGHMEM. Limiting memory to not actually having any (with "mem=880M") will
avoid the TLB flushing impact of HIGHMEM, which is clearly going to be the
_bulk_ of the overhead, but HIGHMEM is still going to be noticeable on at
least some microbenchmarks.

In other words, it's a lot like CONFIG_SMP, but at least CONFIG_SMP has a
damn better reason for existing today than CONFIG_HIGHMEM.

That said, I suspect that now your context-switch test is likely no longer
dominated by that thing, so looking at your numbers:

> minimal config: ~0.001280
> maximal config: ~0.002500 (with actual high mem)
> maximum config: ~0.001925 (with mem=880M)

and I think that change from 0.001280 - 0.001925 (rough averages by
eye-balling it, I didn't actually calculate anything) is still quite
interesting, but I do wonder how much of it ends up being due to just code
generation issues for CONFIG_HIGHMEM and CONFIG_SMP.

> So we're paying a 48% overhead; microbenchmarks always suffer as code is added,
> and we've added a lot of code with these options.

I do agree that microbenchmarks are interesting, and tend to show these
kinds of things clearly. It's just that when you look at the scheduler,
for example, something like SMP support is a _big_ issue, and even if we
get rid of the worst synchronization overhead with "maxcpus=1" at least
removing the "lock" prefixes, I'm not sure how relevant it is to say that
the scheduler is slower with SMP support.

(The same way I don't think it's relevant or interesting to see that it's
slower with HIGHMEM).

They are simply so fundamental features that the two aren't comparable.
Why would anybody compare a UP scheduler with a SMP scheduler? It's simply
not the same problem. What does it mean to say that one is 48% slower?
That's like saying that a squirrell is 48% juicier than an orange - maybe
it's true, but anybody who puts the two in a blender to compare them is
kind of sick. The comparison is ugly and pointless.

Now, other feature comparisons are way more interesting. For example, if
statistics gathering is a noticeable portion of the 48%, then that really
is a very relevant comparison, since scheduler statistics is something
that is in no way "fundamental" to the hardware base, and most people
won't care.

So comparing a "scheduler statistics" overhead vs "minimal config"
overhead is very clearly a sane thing to do. Now we're talking about a
feature that most people - even if it was somehow hardware related -
wouldn't use or care about.

IOW, even if it were to use hardware features (say, something like
oprofile, which is at least partly very much about exposing actual
physical features of the hardware), if it's not fundamental to the whole
usage for a huge percentage of people, then it's a "optional feature", and
seeing slowdown is a big deal.

Something like CONFIG_HIGHMEM* or CONFIG_SMP is not really what I'd ever
call "optional feature", although I hope to Dog that CONFIG_HIGHMEM can
some day be considered that some day.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/