Re: [BUG] perf stat: useless output for raw events with new eventparser

From: Peter Zijlstra
Date: Mon May 07 2012 - 08:43:05 EST


On Thu, 2012-04-26 at 19:36 +0200, Robert Richter wrote:
> On 26.04.12 17:39:32, Peter Zijlstra wrote:
> > On Thu, 2012-04-26 at 16:45 +0200, Robert Richter wrote:
> > > It is totally ok to have parser support for this. I simply do not see
> > > why we need to put the encoding into sysfs. We somehow know on which
> > > hardware we run and the parser should already know how to setup the
> > > syscall. So parsing the above finally ends in calling of something
> > > like:
> > >
> > > setup_event_for_some_pmu(event, 0x4e2, 0xf8);
> > >
> > > We don't need any description of bit masks in sysfs for this.
> >
> > Its the kernel side decoding perf_event_attr, so it seems sensible to
> > also describe this encoding from the kernel.
>
> For perfctr we have:
>
> /sys/bus/event_source/devices/cpu/format/inv: config:23
> /sys/bus/event_source/devices/cpu/format/edge: config:18
> /sys/bus/event_source/devices/cpu/format/cmask: config:24-31
> /sys/bus/event_source/devices/cpu/format/event: config:0-7,32-35
> /sys/bus/event_source/devices/cpu/format/umask: config:8-15
>
> The kernel does not en- or decode anything in the config value. It is
> directly passed to the pmu with some validation of the values.

For AMD and most Intel, yes. But there's no requirement for this to be
so. IIRC, Intel P4 doesn't.

Furthermore, extra data required for some events doesn't have a well
defined place in the one ::config and we use config1,2, but how do we
tell what goes where?

> Everything else is in userland since it composes the syscall.

Dependent on what you want to achieve, but with the format/ and event/
stuff you can get a whole way without userspace ever having to know what
particular hardware its running on.

Ideally we'd get all the way to where the user itself doesn't care, once
the user starts caring he/she will have to open up the BKDG/SDM/other
volume of arch magic. At which point they'd better be well qualified to
deal with all that brings.

> The kernel must now contain code like this:
>
> PMU_FORMAT_ATTR(event, "config:0-7,32-35");
> PMU_FORMAT_ATTR(umask, "config:8-15" );
> PMU_FORMAT_ATTR(edge, "config:18" );
> PMU_FORMAT_ATTR(inv, "config:23" );
> PMU_FORMAT_ATTR(cmask, "config:24-31" );
>
> Which is unrelated to anything else, duplicates the effort to maintain
> bit masks and thus is more error-prone. Besides this there is no need
> for it because the values are fix and do not change.

If it doesn't change there's no maintenance overhead is there?

> We simply know
> the format of the config value already, so the format entries are of
> no use.

And yet 'event' isn't the same for Intel and AMD, this means I'd have to
add all kinds of cpu detection code into the parser. By having the
kernel provide this information the parser doesn't have to care about
it.

> One could argue that feeding a generic pmu setup with the format
> configuration reduces the need to modify userland, we have same code
> for various archs. But if I have the choice I rather update my perf
> tool chain than rebooting the kernel to update perf.

You'll have to reboot at some point anyway, you can always frob
perf_event_attr::config* by hand without the aid of this sysfs stuff,
but when the kernel cannot handle the data or otherwise doesn't know how
to talk to the hardware you'll have to go reboot.

I really don't see the problem with reboots and wish people would stop
using that silly argument.

> > Currently we mostly match the hardware encoding, but there's no strict
> > requirement to do so, we can already see some of that with the extra_reg
> > stuff, perf_event_attr::config1 can mean different things depending on
> > the event.
>
> Of course the config values of the syscall could be translated into a
> different hardware configuration. But its layout is always spec'ed
> somewhere and needs no description in sysfs.

You mean there's a spec (kernel source excluded) that says that the
offcore response msr goes in perf_event_attr::config1 ?

> > Keeping all this information in two places just seems like asking for it
> > to get out of sync.
>
> All this could reside in userland at one place too.

Userland would first have to figure out what physical hardware it was
running on to determine if it has an offcore response msr, then it would
have to 'know' this value is supposed to go in perf_event_attr::config1.

By hard-coding this in userspace the kernel is then never allowed to
change it and you'd have to duplicate this knowledge everywhere you'd
want to use the syscall. You could 'optimize' this by using a library to
reduce the userland copies to 1, but the fact is that both kernel and
userland need to 'know' this independently.

Yet by having it in sysfs the kernel can tell userspace than it has an
offcore_rsp field and if we get a value for it it goes wherever it says.
Userspace doesn't need to know anything other than to look up stuff in
sysfs and there's only 1 copy of the knowledge needed -- in the kernel.

So if I try to set offcore_rsp on an AMD machine it'll error since they
don't provide that field.

Same with the event field, without having to have cpu checks, perf can
deal with 'cpu/event=0x4e3/'. On Intel it can tell us this won't go
since the event field isn't wide enough, on AMD it knows to stick that 4
in bits 32-35 of perf_event_attr::config.

> Also, the syscall
> definition is sufficient as interface description and both sides must
> handle any differences of kernel or userland implementations.

You've just posted a patch ([PATCH 4/7] perf/x86-ibs: Add support for
IBS pseudo events) that's the very counter example.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/