Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs

From: Martin Schwidefsky
Date: Tue Nov 22 2016 - 09:28:57 EST


On Tue, 22 Nov 2016 14:45:56 +0100
Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> On Mon, Nov 21, 2016 at 11:17:28AM +0100, Martin Schwidefsky wrote:
> > On Mon, 21 Nov 2016 07:59:56 +0100
> > Martin Schwidefsky <schwidefsky@xxxxxxxxxx> wrote:
> [...]
> > @@ -110,34 +119,48 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> > #endif
> > : "=m" (S390_lowcore.last_update_timer),
> > "=m" (S390_lowcore.last_update_clock));
> > - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> > - S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> > + clock = S390_lowcore.last_update_clock - clock;
> > + timer -= S390_lowcore.last_update_timer;
> > +
> > + if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > + S390_lowcore.guest_timer += timer;
> > + else if (hardirq_count() - hardirq_offset)
> > + S390_lowcore.hardirq_timer += timer;
> > + else if (in_serving_softirq())
> > + S390_lowcore.softirq_timer += timer;
> > + else
> > + S390_lowcore.system_timer += timer;
>
> I initially thought that some code could be shared for that whole accumulation. Now I
> don't know if it would be a good idea. An example would be to deal with the contexts above
> in order to store the accumulation to the appropriate place.

I thought about a common code inline function that returns the index
(CPUTIME_SYSTEM, CPUTIME_IRQ, ..) for the current context. Did not look
too appealing anymore after I type it down.

> > - account_user_time(tsk, user, user_scaled);
> > - account_system_time(tsk, hardirq_offset, system, system_scaled);
> > + hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> > + tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > + softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> > + tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> > + S390_lowcore.steal_timer +=
> > + clock - user - guest - system - hardirq - softirq;
> > +
> > + /* Push account value */
> > + if (user)
> > + account_user_time(tsk, user, scale_vtime(user));
> > + if (guest)
> > + account_guest_time(tsk, guest, scale_vtime(guest));
> > + if (system)
> > + account_sys_time(tsk, system, scale_vtime(system));
> > + if (hardirq)
> > + account_hardirq_time(tsk, hardirq, scale_vtime(hardirq));
> > + if (softirq)
> > + account_softirq_time(tsk, softirq, scale_vtime(softirq));
>
> And doing that would be another part of the shared code.

Right now I would feel more comfortable if that stays architecture code.
The calculation up to the point where accout_xxx_time function can be
called is definitely arch specific. Why try to do the accumulation in
common code? I have the feeling that would just complicate the code for
no good reason.

> >
> > steal = S390_lowcore.steal_timer;
> > if ((s64) steal > 0) {
> > @@ -145,16 +168,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> > account_steal_time(steal);
> > }
> >
> > - return virt_timer_forward(user + system);
> > + return virt_timer_forward(user + guest + system + hardirq + softirq);
> > }
> >
> > void vtime_task_switch(struct task_struct *prev)
> > {
> > do_account_vtime(prev, 0);
> > prev->thread.user_timer = S390_lowcore.user_timer;
> > + prev->thread.guest_timer = S390_lowcore.guest_timer;
> > prev->thread.system_timer = S390_lowcore.system_timer;
> > + prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > + prev->thread.softirq_timer = S390_lowcore.softirq_timer;
> > S390_lowcore.user_timer = current->thread.user_timer;
> > + S390_lowcore.guest_timer = current->thread.guest_timer;
> > S390_lowcore.system_timer = current->thread.system_timer;
> > + S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> > + S390_lowcore.softirq_timer = current->thread.softirq_timer;
> > }
>
> Ditto.

Same here. The lowcore fields are too arch specific.

> >
> > /*
> > @@ -174,31 +203,22 @@ void vtime_account_user(struct task_struct *tsk)
> > */
> > void vtime_account_irq_enter(struct task_struct *tsk)
> > {
> > - u64 timer, system, system_scaled;
> > + u64 timer;
> >
> > timer = S390_lowcore.last_update_timer;
> > S390_lowcore.last_update_timer = get_vtimer();
> > - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> > -
> > - /* Update MT utilization calculation */
> > - if (smp_cpu_mtid &&
> > - time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> > - update_mt_scaling();
> > -
> > - system = S390_lowcore.system_timer - tsk->thread.system_timer;
> > - S390_lowcore.steal_timer -= system;
> > - tsk->thread.system_timer = S390_lowcore.system_timer;
> > - system_scaled = system;
> > - /* Do MT utilization scaling */
> > - if (smp_cpu_mtid) {
> > - u64 mult = __this_cpu_read(mt_scaling_mult);
> > - u64 div = __this_cpu_read(mt_scaling_div);
> > -
> > - system_scaled = (system_scaled * mult) / div;
> > - }
> > - account_system_time(tsk, 0, system, system_scaled);
> > -
> > - virt_timer_forward(system);
> > + timer -= S390_lowcore.last_update_timer;
> > +
> > + if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
> > + S390_lowcore.guest_timer += timer;
> > + else if (hardirq_count())
> > + S390_lowcore.hardirq_timer += timer;
> > + else if (in_serving_softirq())
> > + S390_lowcore.softirq_timer += timer;
> > + else
> > + S390_lowcore.system_timer += timer;
>
> And Ditto.

It would be nice if we can find a solution to make the decision
tree where to put the cputime delta into common code.

> We could put together the accumulation in a common struct in s390_lowcore,
> and its mirror in thread struct then have helpers take care of the contexts.
>
> How does that sound to you, would it help or hurt?

My gut feeling is that the try to make the accumulation code common will
hurt more than it helps. But we can certainly try and look at the result.

I spent some more time on this, here is my current patch. For my part
the patch is close to the final solution if we can agree on it.
--