Re: [PATCH RFC] mailbox: move controller timer to per-channel timers

From: Alexey Klimov
Date: Thu May 25 2017 - 13:44:38 EST


Hi Jassi,

sorry for delay again.

On Tue, Apr 11, 2017 at 06:30:08PM +0530, Jassi Brar wrote:
> On 11 April 2017 at 18:04, Alexey Klimov <alexey.klimov@xxxxxxx> wrote:
> > On Fri, Apr 07, 2017 at 08:39:35PM +0530, Jassi Brar wrote:
> >> On Thu, Apr 6, 2017 at 11:01 PM, Alexey Klimov <alexey.klimov@xxxxxxx> wrote:
> >> > When mailbox controller provides two or more channels and
> >> > they are actively used by mailbox client(s) it's very easy
> >> > to trigger the warning in hrtimer_forward():
> >> >
> >> > [ 247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 hrtimer_forward+0x88/0xd8
> >> > [ 247.853549] Modules linked in:
> >> > [ 247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.11.0-rc2-00362-g93afaa4513bb-dirty #13
> >> > [ 247.854472] Hardware name: linux,dummy-virt (DT)
> >> > [ 247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000
> >> > [ 247.854999] PC is at hrtimer_forward+0x88/0xd8
> >> > [ 247.855280] LR is at txdone_hrtimer+0xd4/0xf8
> >> > [ 247.855551] pc : [<ffff0000081039f0>] lr : [<ffff00000881b874>] pstate: 200001c5
> >> > [ 247.855857] sp : ffff80001efbdeb0
> >> > [ 247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140
> >> > [ 247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6
> >> > [ 247.856604] x25: ffff000008e756be x24: ffff80001c4a1348
> >> > [ 247.856882] x23: 0000000000000001 x22: 00000000000000f8
> >> > [ 247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110
> >> > [ 247.857509] x19: 00000000000f4240 x18: 0000000000000030
> >> > [ 247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80
> >> > [ 247.858000] x15: 0000000000000010 x14: 00000000fffffff0
> >> > [ 247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb
> >> > [ 247.858381] x11: ffff000008979690 x10: 0000000000000000
> >> > [ 247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0
> >> > [ 247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2
> >> > [ 247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348
> >> > [ 247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240
> >> > [ 247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12
> >> > [ 247.859582] ---[ end trace d61812426ec3c30b ]---
> >> >
> >> > To fix this current patch migrates hr timers to be per-channel
> >> > instead of using only one timer per-controller.
> >> >
> >> I think we can do by just checking if hrtimer_active() returns false
> >> before we do hrtimer_start() in msg_submit() ?
> >
> > It looks like it can be easily broken:
> >
> > 1) let's say first thread executes timer callback and already checked last_tx_done
> > on channel 0;
> > 2) second thread submits a message to the controller, say, on channel 0 and with
> > help of hrtimer_active() observes that the timer is active (because timer callback
> > is running) and decides not to (re-)start timer;
> >
> > After this first thread decides not to restart the timer and finishes callback.
> > The thing that first thread executes tx_tick isn't helpful: for example first
> > thread may have no messages to submit on any channel and therefore is not going
> > to deal with timer.
> >
> > Finally, mailbox state machine is stalled. Second thread thinks that timer is
> > active while it's not.
> >
> ... you mean race :) and we have locks for that. You want me to send
> in a patch?

We don't have separate lock for timer.

> > One of the main questions is that there is only one timer per few channels
> > in current code.
> >
> I see that as a good thing because
> a) Polling anyway doesn't provide 'hard' guarantee even if we have one
> timer per channel
> b) The poll period remains same for every channel, so functionality
> wise you only increase timer callbacks.

Do you mean something like this below?

The patch isn't really tested on multi-channel environment yet but
I will test it. I just want to know if I am on the right way here.

I know there are some adjustments that can be done in the loop in hr-timer
callback. The thing that I don't like here is a lot of spin_lock/unlocks
in the timer callback.

Thanks,
Alexey