Re: [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples

From: Leo Yan
Date: Thu Feb 06 2020 - 03:24:43 EST


On Wed, Feb 05, 2020 at 04:09:01PM +0000, Mike Leach wrote:
> Hi Leo,
>
> There are a couple of typos in the comments below, but I also believe
> that the sample loop could be considerably simplified
>
> On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@xxxxxxxxxx> wrote:
> >
> > When 'etm->instructions_sample_period' is less than
> > 'tidq->period_instructions', the function cs_etm__sample() cannot handle
> > this case properly with its logic.
> >
> > Let's see below flow as an example:
> >
> > - If we set itrace option '--itrace=i4', then function cs_etm__sample()
> > has variables with initialized values:
> >
> > tidq->period_instructions = 0
> > etm->instructions_sample_period = 4
> >
> > - When the first packet is coming:
> >
> > packet->instr_count = 10; the number of instructions executed in this
> > packet is 10, thus update period_instructions as below:
> >
> > tidq->period_instructions = 0 + 10 = 10
> > instrs_over = 10 - 4 = 6
> > offset = 10 - 6 - 1 = 3
> > tidq->period_instructions = instrs_over = 6
> >
> > - When the second packet is coming:
> >
> > packet->instr_count = 10; in the second pass, assume 10 instructions
> > in the trace sample again:
> >
> > tidq->period_instructions = 6 + 10 = 16
> > instrs_over = 16 - 4 = 12
> > offset = 10 - 12 - 1 = -3 -> the negative value
> > tidq->period_instructions = instrs_over = 12
> >
> > So after handle these two packets, there have below issues:
> >
> > The first issue is that cs_etm__instr_addr() returns the address within
> > the current trace sample of the instruction related to offset, so the
> > offset is supposed to be always unsigned value. But in fact, function
> > cs_etm__sample() might calculate a negative offset value (in handling
> > the second packet, the offset is -3) and pass to cs_etm__instr_addr()
> > with u64 type with a big positive integer.
> >
> > The second issue is it only synthesizes 2 samples for sample period = 4.
> > In theory, every packet has 10 instructions so the two packets have
> > total 20 instructions, 20 instructions should generate 5 samples
> > (4 x 5 = 20). This is because cs_etm__sample() only calls once
> > cs_etm__synth_instruction_sample() to generate instruction sample per
> > range packet.
> >
> > This patch fixes the logic in function cs_etm__sample(); the basic
> > idea is to divide into three parts for handling coming packet:
> >
> > - The first part is for synthesizing the first instruction sample, it
> > combines the instructions from the tail of previous packet and the
> > instructions from the head of the new packet;
> > - The second part is to simply generate samples with sample period
> > aligned;
> > - The third part is the tail of new packet, the rest instructions will
> > be left for the sequential sample handling.
> >
> > Suggested-by: Mike Leach <mike.leach@xxxxxxxxxx>
> > Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
> > ---
> > tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 92 insertions(+), 13 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 3e28462609e7..c5a05f728eac 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> > * TODO: allow period to be defined in cycles and clock time
> > */
> >
> > - /* Get number of instructions executed after the sample point */
> > - u64 instrs_over = tidq->period_instructions -
> > - etm->instructions_sample_period;
> > + /*
> > + * Below diagram demonstrates the instruction samples
> > + * generation flows:
> > + *
> > + * Instrs Instrs Instrs Instrs
> > + * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
> > + * | | | |
> > + * V V V V
> > + * --------------------------------------------------
> > + * ^ ^
> > + * | |
> > + * Period Period
> > + * instructions(Pi) instructions(Pi')
> > + *
> > + * | |
> > + * \---------------- -----------------/
> > + * V
> > + * instrs_executed
> > + *
> > + * Period instructions (Pi) contains the the number of
> > + * instructions executed after the sample point(n). When a new
> > + * instruction packet is coming and generate for the next sample
> > + * (n+1), it combines with two parts instructions, one is the
> > + * tail of the old packet and another is the head of the new
> > + * coming packet. So 'head' variable is used to cauclate the
> typo : s/cauclate/calculate

Used checkpatch.pl but didn't see any complaints for this.

Thanks for pointing out and will fix it.

> > + * instruction numbers in the new packet for sample(n+1).
> > + *
> > + * Sample(n+2) and sample(n+3) consume the instructions with
> > + * sample period, so directly generate samples based on the
> > + * sampe period.
> > + *
> typo: s/sampe/sample

Will fix.

> > + * After sample(n+3), the rest instructions will be used by
> > + * later packet; so use 'instrs_over' to track the rest
> > + * instruction number and it is assigned to
> > + * 'tidq->period_instructions' for next round calculation.
> > + */
> > + u64 head, offset = 0;
> > + u64 addr;
> >
> > /*
> > - * Calculate the address of the sampled instruction (-1 as
> > - * sample is reported as though instruction has just been
> > - * executed, but PC has not advanced to next instruction)
> > + * 'instrs_over' is the number of instructions executed after
> > + * sample points, initialise it to 'instrs_executed' and will
> > + * decrease it for consumed instructions in every synthesized
> > + * instruction sample.
> > */
> > - u64 offset = (instrs_executed - instrs_over - 1);
> > - u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > - tidq->packet, offset);
> > + u64 instrs_over = instrs_executed;
> >
> > - ret = cs_etm__synth_instruction_sample(
> > - etmq, tidq, addr, etm->instructions_sample_period);
> > - if (ret)
> > - return ret;
> > + /*
> > + * 'head' is the instructions number of the head in the new
> > + * packet, it combines with the tail of previous packet to
> > + * generate a sample. So 'head' uses the sample period to
> > + * decrease the instruction number introduced by the previous
> > + * packet.
> > + */
> > + head = etm->instructions_sample_period -
> > + (tidq->period_instructions - instrs_executed);
> > +
> > + if (head) {
> > + offset = head;
> > +
> > + /*
> > + * Calculate the address of the sampled instruction (-1
> > + * as sample is reported as though instruction has just
> > + * been executed, but PC has not advanced to next
> > + * instruction)
> > + */
> > + addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > + tidq->packet, offset - 1);
> > + ret = cs_etm__synth_instruction_sample(
> > + etmq, tidq, addr,
> > + etm->instructions_sample_period);
> > + if (ret)
> > + return ret;
> > +
> > + instrs_over -= head;
> > + }
> > +
> > + while (instrs_over >= etm->instructions_sample_period) {
> > + offset += etm->instructions_sample_period;
> > +
> > + /*
> > + * Calculate the address of the sampled instruction (-1
> > + * as sample is reported as though instruction has just
> > + * been executed, but PC has not advanced to next
> > + * instruction)
> > + */
> > + addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > + tidq->packet, offset - 1);
> > + ret = cs_etm__synth_instruction_sample(
> > + etmq, tidq, addr,
> > + etm->instructions_sample_period);
> > + if (ret)
> > + return ret;
> > +
> > + instrs_over -= etm->instructions_sample_period;
> > + }
> >
> > /* Carry remaining instructions into next sample period */
> > tidq->period_instructions = instrs_over;
> > --
> > 2.17.1
> >
>
> I believe the following change would work and make for easier reading...
>
> .... at the start of the function remove instrs_executed and replace ....
> /* get instructions remainder from previous packet */
> u64 instrs_prev = tidq->period_instructions;
>
> /* set available instructions to previous packet remainder + the
> current packet count */
> tidq->period_instructions += tidq->packet->instr_count;
>
>
> .... within the if(etm->sample_instructions && ...) statement I would
> be more explicit what the elements of the diagram are ....
>
> /*
> * Below diagram demonstrates the instruction samples
> * generation flows:
> *
> * Instrs Instrs Instrs Instrs
> * Sample(n) Sample(n+1) Sample(n+2) Sample(n+3)
> * | | | |
> * V V V V
> * --------------------------------------------------
> * ^ ^
> * | |
> * Period Period
> * instructions(Pi) instructions(Pi')
> *
> * | |
> * \---------------- -----------------/
> * V
> * tidq->packet->instr_count;
> *
> * Instrs Sample(n...) are the synthesised samples occuring every
> etm->instructions_sample_period
> * instructions - as defined on the perf command line. Sample(n) being
> the last sample before the
> * current etm packet, n+1 to n+3 samples generated from the current etm packet.
> *
> * tidq->packet->instr_count represents the number of instructions in
> the current etm packet.
> *
> * Period instructions (Pi) contains the the number of instructions
> executed after the sample point(n)
> * from the previous etm packet. This will always be less than
> etm->instructions_sample_period.
> *
>
> .... continue with explanation here ....
>
>
> .... then we can simplify the loop code removing some of the temporary
> variables ....
>
> /* get the initial offset into the current packet instructions
> (entry conditions ensure that instrs_prev < etm->instructions_sample_period)
> */
> u64 offset = etm->instructions_sample_period - instrs_prev;
> u64 addr;
>
> /* Prepare last branches for instruction sample */
> if (etm->synth_opts.last_branch)
> cs_etm__copy_last_branch_rb(etmq, tidq);
>
> while (tidq->period_instructions >= etm->instructions_sample_period) {
>
> /*
> * Calculate the address of the sampled instruction (-1
> * as sample is reported as though instruction has just
> * been executed, but PC has not advanced to next
> * instruction)
> */
> addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1);
> ret = cs_etm__synth_instruction_sample( etmq, tidq, addr,
> etm->instructions_sample_period);
> if (ret)
> return ret;
>
> offset += etm->instructions_sample_period;
> tidq->period_instructions -= etm->instructions_sample_period;
> }
>
> .....
> I believe the above should work, but cannot claim to have tried it
> out. What do you think?

Agree. To be honest, I considered to use your suggested way, but I
worried about the boundary conditions for 'offset', so went back to
use explict method with two code segments (head and sequential samples).

After review the suggested code, I don't find any issue. Will refine
code as this way and give testing for it.

Very appreciate the suggestions :)

Leo