Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C style file

From: Ingo Molnar
Date: Fri May 29 2015 - 03:28:05 EST



* Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:

> > So instead of this flat structure, there should at minimum be broad categorization
> > of the various parts of the hardware they relate to: whether they relate to the
> > branch predictor, memory caches, TLB caches, memory ops, offcore, decoders,
> > execution units, FPU ops, etc., etc. - so that they can be queried via 'perf
> > list'.
>
> The categorization is generally on the stem name, which already works fine with
> the existing perf list wildcard support. So for example you only want branches.
>
> perf list br*
> ...
> br_inst_exec.all_branches
> [Speculative and retired branches]
> br_inst_exec.all_conditional
> [Speculative and retired macro-conditional branches]
> br_inst_exec.all_direct_jmp
> [Speculative and retired macro-unconditional branches excluding calls and indirects]
> br_inst_exec.all_direct_near_call
> [Speculative and retired direct near calls]
> br_inst_exec.all_indirect_jump_non_call_ret
> [Speculative and retired indirect branches excluding calls and returns]
> br_inst_exec.all_indirect_near_return
> [Speculative and retired indirect return branches]
> ...
>
> Or mid level cache events:
>
> perf list l2*
> ...
> l2_l1d_wb_rqsts.all
> [Not rejected writebacks from L1D to L2 cache lines in any state]
> l2_l1d_wb_rqsts.hit_e
> [Not rejected writebacks from L1D to L2 cache lines in E state]
> l2_l1d_wb_rqsts.hit_m
> [Not rejected writebacks from L1D to L2 cache lines in M state]
> l2_l1d_wb_rqsts.miss
> [Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.)]
> l2_lines_in.all
> [L2 cache lines filling L2]
> ...
>
> There are some exceptions, but generally it works this way.

You are missing my point in several ways:

1)

Firstly, there are _tons_ of 'exceptions' to the 'stem name' grouping, to the
level that makes it unusable for high level grouping of events.

Here's the 'stem name' histogram on the SandyBridge event list:

$ grep EventName pmu-events/arch/x86/SandyBridge_core.json | cut -d\. -f1 | cut -d\" -f4 | cut -d\_ -f1 | sort | uniq -c | sort -n

1 AGU
1 BACLEARS
1 EPT
1 HW
1 ICACHE
1 INSTS
1 PAGE
1 ROB
1 RS
1 SQ
2 ARITH
2 DSB2MITE
2 ILD
2 LOAD
2 LOCK
2 LONGEST
2 MISALIGN
2 SIMD
2 TLB
3 CPL
3 DSB
3 INST
3 INT
3 LSD
3 MACHINE
4 CPU
4 OTHER
4 PARTIAL
5 CYCLE
5 ITLB
6 LD
7 L1D
8 DTLB
10 FP
12 RESOURCE
21 UOPS
24 IDQ
25 MEM
37 BR
37 L2
131 OFFCORE

Out of 386 events. This grouping has the following severe problems:

- that's 41 'stem name' groups, way too much as a first hop high level
structure. We want the kind of high level categorization I suggested:
cache, decoding, branches, execution pipeline, memory events, vector unit
events - which broad categories exist in all CPUs and are microarchitecture
independent.

- even these 'stem names' are mostly unstructured and unreadable. The two
examples you cited are the best case that are borderline readable, but they
cover less than 20% of all events.

- the 'stem name' concept is not even used consistently, the names are
essentially a random collection of Intel internal acronyms, which occasionally
match up with high level concepts. These vendor defined names have very poor
high level structure.

- the 'stem names' are totally imbalanced: there's one 'super' category 'stem
name': OFFCORE_RESPONSE, with 131 events in it and then there are super small
groups in the list above. Not well suited to get a good overview about what
measurement capabilities the hardware has.

So forget about using 'stem names' as the high level structure. These events have
no high level structure and we should provide that, instead of dumping 380+ events
on the unsuspecting user.

2)

Secondly, categorization and higher level hieararchy should be used to keep the
list manageable. The fact that if _you_ know what to search for you can list just
a subset does not mean anything to the new user trying to discover events.

A simple 'perf list' should list the high level categories by default, with a
count displayed that shows how many further events are within that category.
(compacted tree output would be usable as well.)

> The stem could be put into a separate header, but it would seem redundant to me.

Higher level categories simply don't exist in these names in any usable form, so
it has to be created. Just redundantly repeating the 'stem name' would be silly,
as they are unusable for the purposes of high level categorization.

> > We don't just want the import the unstructured mess that these event files are
> > - we want to turn them into real structure. We can still keep the messy vendor
> > names as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.
>
> The vendor names directly map to the micro architecture, which is whole point of
> the events. IDQ is a part of the CPU, and is described in the CPU manuals. One
> of the main motivations for adding event lists is to make perf match to that
> documentation.

Your argument is a logical fallacy: there is absolutely no conflict between also
supporting quirky vendor names and also having good high level structure and
naming, to make it all accessible to the first time user.

> > 3)
> >
> > There should be good 'perf list' visualization for these events: grouping,
> > individual names, with a good interface to query details if needed. I.e. it
> > should be possible to browse and discover events relevant to the CPU the tool
> > is executing on.
>
> I suppose we could change perf list to give the stem names as section headers to
> make the long list a bit more readable.

No, the 'stem names' are crap - instead we want to create sensible high level
categories and want to categorize the events, I gave you a few ideas above and in
the previous mail.

> Generally you need to have some knowledge of the micro architecture to use these
> events. There is no way around that.

Here your argument again relies on a logical fallacy: there is absolutely no
conflict between good high level structure, and the idea that you need to know
about CPUs to make sense of hardware events that deal with fine internal details.

Also, you are denying the plain fact that the highest level categories _are_
largely microarchitecture independent: can you show me a single modern mainstream
x86 CPU that doesn't have these broad high level categories:

- CPU cache
- memory accesses
- decoding, branch execution
- execution pipeline
- FPU, vector units

?

There's none, and the reason is simple: the high level structure of CPUs is still
dictated by basic physics, and physics is microarchitecture independent.

Lower level structure will inevitably be microarchitecture and sometimes even
model specific - but that's absolutely no excuse to not have good high level
structure.

So these are not difficult concepts at all, please make an honest effort at
understanding then and responding to them, as properly addressing them is a
must-have for this patch submission.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/