Re: [PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq

From: Peter Zijlstra
Date: Tue Jun 21 2016 - 18:41:22 EST


On Thu, Jun 16, 2016 at 12:06:07PM -0400, riel@xxxxxxxxxx wrote:
> @@ -53,36 +56,72 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
> * softirq -> hardirq, hardirq -> softirq
> *
> * When exiting hardirq or softirq time, account the elapsed time.
> + *
> + * When exiting softirq time, subtract the amount of hardirq time that
> + * interrupted this softirq run, to avoid double accounting of that time.
> */
> void irqtime_account_irq(struct task_struct *curr, int irqtype)
> {
> + u64 prev_softirq_start;
> + u64 prev_hardirq;
> + u64 hardirq_time;
> + s64 delta = 0;

We appear to always assign to delta, so this initialization seems
superfluous.

> int cpu;
>
> if (!sched_clock_irqtime)
> return;
>
> cpu = smp_processor_id();

Per this smp_processor_id() usage, preemption is disabled.

> + /*
> + * Softirq context may get interrupted by hardirq context,
> + * on the same CPU. At softirq entry time the amount of time
> + * spent in hardirq context is stored. At softirq exit time,
> + * the time spent in hardirq context during the softirq is
> + * subtracted.
> + */
> + prev_hardirq = __this_cpu_read(prev_hardirq_time);
> + prev_softirq_start = __this_cpu_read(softirq_start_time);
> +
> + if (irqtype == HARDIRQ_OFFSET) {
> + delta = sched_clock_cpu(cpu) - __this_cpu_read(hardirq_start_time);
> + __this_cpu_add(hardirq_start_time, delta);
> + } else do {
> + u64 now = sched_clock_cpu(cpu);
> + hardirq_time = READ_ONCE(per_cpu(cpu_hardirq_time, cpu));

Which makes this per_cpu(,cpu) usage somewhat curious. What's wrong with
__this_cpu_read() ?

> +
> + delta = now - prev_softirq_start;
> + if (in_serving_softirq()) {
> + /*
> + * Leaving softirq context. Avoid double counting by
> + * subtracting hardirq time from this interval.
> + */
> + s64 hi_delta = hardirq_time - prev_hardirq;
> + delta -= hi_delta;
> + } else {
> + /* Entering softirq context. Note start times. */
> + __this_cpu_write(softirq_start_time, now);
> + __this_cpu_write(prev_hardirq_time, hardirq_time);
> + }
> + /*
> + * If a hardirq happened during this calculation, it may not
> + * have gotten a consistent snapshot. Try again.
> + */
> + } while (hardirq_time != READ_ONCE(per_cpu(cpu_hardirq_time, cpu)));

That whole thing is somewhat hard to read; but its far too late for me
to suggest anything more readable :/

> + irq_time_write_begin(irqtype);
> /*
> * We do not account for softirq time from ksoftirqd here.
> * We want to continue accounting softirq time to ksoftirqd thread
> * in that case, so as not to confuse scheduler with a special task
> * that do not consume any time, but still wants to run.
> */
> + if (irqtype == HARDIRQ_OFFSET && hardirq_count())
> __this_cpu_add(cpu_hardirq_time, delta);
> + else if (irqtype == SOFTIRQ_OFFSET && in_serving_softirq() &&
> + curr != this_cpu_ksoftirqd())
> __this_cpu_add(cpu_softirq_time, delta);
>
> + irq_time_write_end(irqtype);

Maybe split the whole thing on irqtype at the very start, instead of the
endless repeated branches?

> }
> EXPORT_SYMBOL_GPL(irqtime_account_irq);