Re: INFO: rcu detected stall in do_idle

From: luca abeni
Date: Thu Oct 18 2018 - 08:36:15 EST


Hi Juri,

On Thu, 18 Oct 2018 14:21:42 +0200
Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
[...]
> > > > I missed the original emails, but maybe the issue is that the
> > > > task blocks before the tick, and when it wakes up again
> > > > something goes wrong with the deadline and runtime assignment?
> > > > (maybe because the deadline is in the past?)
> > >
> > > No, the problem is that the task won't be throttled at all,
> > > because its replenishing instant is always way in the past when
> > > tick occurs. :-/
> >
> > Ok, I see the issue now: the problem is that the "while
> > (dl_se->runtime <= 0)" loop is executed at replenishment time, but
> > the deadline should be postponed at enforcement time.
> >
> > I mean: in update_curr_dl() we do:
> > dl_se->runtime -= scaled_delta_exec;
> > if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > ...
> > enqueue replenishment timer at dl_next_period(dl_se)
> > But dl_next_period() is based on a "wrong" deadline!
> >
> >
> > I think that inserting a
> > while (dl_se->runtime <= -pi_se->dl_runtime) {
> > dl_se->deadline += pi_se->dl_period;
> > dl_se->runtime += pi_se->dl_runtime;
> > }
> > immediately after "dl_se->runtime -= scaled_delta_exec;" would fix
> > the problem, no?
>
> Mmm, I also thought of letting the task "pay back" its overrunning.
> But, doesn't this get us quite far from what one would expect. I mean,
> enforcement granularity will be way different from task period, no?

Yes, the granularity will be what the kernel can provide (due to the HZ
value and to the hrtick on/off state). But at least the task will not
starve non-deadline tasks (which is bug that originated this
discussion, I think).

If I understand well, there are two different (and orthogonal) issues
here:
1) Due to a bug in the accounting / enforcement mechanisms
(the wrong placement of the while() loop), the tasks consumes 100% of
the CPU time, starving non-deadline tasks
2) Due to the large HZ value, the small runtime (and period) and the
fact that hrtick is disabled, the kernel cannot provide the
requested scheduling granularity

The second issue can be fixed by imposing limits on minimum and maximum
runtime and the first issue can be fixed by changing the code as I
suggested in my previous email.

I would suggest to address both the two issues, with separate changes
(the current replenishment code looks strange anyway).



Luca