Re: [PATCH v2 6/7] mailbox: bcm-flexrm-mailbox: Set msg_queue_len for each channel

From: Anup Patel
Date: Fri Jul 28 2017 - 05:48:38 EST


On Fri, Jul 28, 2017 at 2:34 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
> On Fri, Jul 28, 2017 at 2:19 PM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>> On Thu, Jul 27, 2017 at 5:23 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>> On Thu, Jul 27, 2017 at 11:20 AM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>>>> On Thu, Jul 27, 2017 at 10:29 AM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>>
>>>>>>>>>> Sorry for the delayed response...
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 21, 2017 at 9:16 PM, Jassi Brar <jassisinghbrar@xxxxxxxxx> wrote:
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 21, 2017 at 12:25 PM, Anup Patel <anup.patel@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>> The Broadcom FlexRM ring (i.e. mailbox channel) can handle
>>>>>>>>>>>> larger number of messages queued in one FlexRM ring hence
>>>>>>>>>>>> this patch sets msg_queue_len for each mailbox channel to
>>>>>>>>>>>> be same as RING_MAX_REQ_COUNT.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Anup Patel <anup.patel@xxxxxxxxxxxx>
>>>>>>>>>>>> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx>
>>>>>>>>>>>> ---
>>>>>>>>>>>> drivers/mailbox/bcm-flexrm-mailbox.c | 5 ++++-
>>>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>> index 9873818..20055a0 100644
>>>>>>>>>>>> --- a/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>> +++ b/drivers/mailbox/bcm-flexrm-mailbox.c
>>>>>>>>>>>> @@ -1683,8 +1683,11 @@ static int flexrm_mbox_probe(struct platform_device *pdev)
>>>>>>>>>>>> ret = -ENOMEM;
>>>>>>>>>>>> goto fail_free_debugfs_root;
>>>>>>>>>>>> }
>>>>>>>>>>>> - for (index = 0; index < mbox->num_rings; index++)
>>>>>>>>>>>> + for (index = 0; index < mbox->num_rings; index++) {
>>>>>>>>>>>> + mbox->controller.chans[index].msg_queue_len =
>>>>>>>>>>>> + RING_MAX_REQ_COUNT;
>>>>>>>>>>>> mbox->controller.chans[index].con_priv = &mbox->rings[index];
>>>>>>>>>>>> + }
>>>>>>>>>>>>
>>>>>>>>>>> While writing mailbox.c I wasn't unaware that there is the option to
>>>>>>>>>>> choose the queue length at runtime.
>>>>>>>>>>> The idea was to keep the code as simple as possible. I am open to
>>>>>>>>>>> making it a runtime thing, but first, please help me understand how
>>>>>>>>>>> that is useful here.
>>>>>>>>>>>
>>>>>>>>>>> I understand FlexRm has a ring buffer of RING_MAX_REQ_COUNT(1024)
>>>>>>>>>>> elements. Any message submitted to mailbox api can be immediately
>>>>>>>>>>> written onto the ringbuffer if there is some space.
>>>>>>>>>>> Is there any mechanism to report back to a client driver, if its
>>>>>>>>>>> message in ringbuffer failed "to be sent"?
>>>>>>>>>>> If there isn't any, then I think, in flexrm_last_tx_done() you should
>>>>>>>>>>> simply return true if there is some space left in the rung-buffer,
>>>>>>>>>>> false otherwise.
>>>>>>>>>>
>>>>>>>>>> Yes, we have error code in "struct brcm_message" to report back
>>>>>>>>>> errors from send_message. In our mailbox clients, we check
>>>>>>>>>> return value of mbox_send_message() and also the error code
>>>>>>>>>> in "struct brcm_message".
>>>>>>>>>>
>>>>>>>>> I meant after the message has been accepted in the ringbuffer but the
>>>>>>>>> remote failed to receive it.
>>>>>>>>
>>>>>>>> Yes, even this case is handled.
>>>>>>>>
>>>>>>>> In case of IO errors after message has been put in ring buffer, we get
>>>>>>>> completion message with error code and mailbox client drivers will
>>>>>>>> receive back "struct brcm_message" with error set.
>>>>>>>>
>>>>>>>> You can refer flexrm_process_completions() for more details.
>>>>>>>>
>>>>> It doesn't seem to be what I suggest. I see two issues in
>>>>> flexrm_process_completions()
>>>>> 1) It calls mbox_send_message(), which is a big NO for a controller
>>>>> driver. Why should you have one more message stored outside of
>>>>> ringbuffer?
>>>>
>>>> The "last_pending_msg" in each FlexRM ring was added to fit FlexRM
>>>> in Mailbox framework.
>>>>
>>>> We don't have any IRQ for TX done so "txdone_irq" out of the question for
>>>> FlexRM. We only have completions for both success or failures (IO errors).
>>>>
>>>> This means we have to use "txdone_poll" for FlexRM. For "txdone_poll",
>>>> we have to provide last_tx_done() callback. The last_tx_done() callback
>>>> is supposed to return true if last send_data() call succeeded.
>>>>
>>>> To implement last_tx_done() in FlexRM driver, we added "last_pending_msg".
>>>>
>>>> When "last_pending_msg" is NULL it means last call to send_data() succeeded
>>>> and when "last_pending_msg" is != NULL it means last call to send_data()
>>>> did not go through due to lack of space in FlexRM ring.
>>>>
>>> It could be simpler.
>>> Since flexrm_send_data() is essentially about putting the message in
>>> the ring-buffer (and not about _transmission_ failures), the
>>> last_tx_done() should simply return true if requests_ida has not all
>>> ids allocated. False otherwise.
>>
>> It's not that simple because we have two cases in-which
>> send_data() will fail:
>> 1. It run-out of IDs in requests_ida
>> 2. There is no room in BD queue of FlexRM ring. This because each
>> brcm_message can be translated into variable number of descriptors.
>> In fact, using SPU2 crypto client we have one brcm_message translating
>> into 100's of descriptors. All-in-all few messages (< 1024) can also
>> fill-up the BD queue of FlexRM ring.
>>
> OK let me put it abstractly... return false if "there is no space for
> another message in the ringbuffer", true otherwise.

Let say at time T, there was no space in BD queue. Now at
time T+X when last_tx_done() it is possible that BD queue
has space because FlexRM has processed some more
descriptor.

I think last_tx_done() for "txdone_poll" method will require
some information passing from send_data() callback to
last_tx_done() which is last_pending_msg for FlexRM driver.

Anyways, I plan to try "txdone_ack" method so I will
remove last_tx_done() and last_pending_msg both.
What do you think?

>
>
>>>>>
>>>>> 2) It calls mbox_chan_received_data() which is for messages received
>>>>> from the remote. And not the way to report failed _transmission_, for
>>>>> which the api calls back mbox_client.tx_done() . In your client
>>>>> driver please populate mbox_client.tx_done() and see which message is
>>>>> reported "sent fine" when.
>>>>>
>>>>>
>>>>>>>>> There seems no such provision. IIANW, then you should be able to
>>>>>>>>> consider every message as "sent successfully" once it is in the ring
>>>>>>>>> buffer i.e, immediately after mbox_send_message() returns 0.
>>>>>>>>> In that case I would think you don't need more than a couple of
>>>>>>>>> entries out of MBOX_TX_QUEUE_LEN ?
>>>>>>>>
>>>>>>>> What I am trying to suggest is that we can take upto 1024 messages
>>>>>>>> in a FlexRM ring but the MBOX_TX_QUEUE_LEN limits us queuing
>>>>>>>> more messages. This issue manifest easily when multiple CPUs
>>>>>>>> queues to same FlexRM ring (i.e. same mailbox channel).
>>>>>>>>
>>>>>>> OK then, I guess we have to make the queue length a runtime decision.
>>>>>>
>>>>>> Do you agree with approach taken by PATCH5 and PATCH6 to
>>>>>> make queue length runtime?
>>>>>>
>>>>> I agree that we may have to get the queue length from platform, if
>>>>> MBOX_TX_QUEUE_LEN is limiting performance. That will be easier on both
>>>>> of us. However I suspect the right fix for _this_ situation is in
>>>>> flexrm driver. See above.
>>>>
>>>> The current implementation is trying to model FlexRM using "txdone_poll"
>>>> method and that's why we have dependency on MBOX_TX_QUEUE_LEN
>>>>
>>>> I think what we really need is new method for "txdone" to model ring
>>>> manager HW (such as FlexRM). Let's call it "txdone_none".
>>>>
>>>> For "txdone_none", it means there is no "txdone" reporting in HW
>>>> and mbox_send_data() should simply return value returned by
>>>> send_data() callback. The last_tx_done() callback is not required
>>>> for "txdone_none" and MBOX_TX_QUEUE_LEN also has no
>>>> effect on "txdone_none". Both blocking and non-blocking clients
>>>> are treated same for "txdone_none".
>>>>
>>> That is already supported :)
>>
>> If you are referring to "txdone_ack" then this cannot be used here
>> because for "txdone_ack" we have to call mbox_chan_txdon() API
>> after writing descriptors in send_data() callback which will cause
>> dead-lock in tx_tick() called by mbox_chan_txdone().
>>
> Did you read my code snippet below?
>
> It's not mbox_chan_txdone(), but mbox_client_txdone() which is called
> by the client.
>
>>>
>>> In drivers/dma/bcm-sba-raid.c
>>>
>>> sba_send_mbox_request(...)
>>> {
>>> ......
>>> req->msg.error = 0;
>>> ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>>> if (ret < 0) {
>>> dev_err(sba->dev, "send message failed with error %d", ret);
>>> return ret;
>>> }
>>> ret = req->msg.error;
>>> if (ret < 0) {
>>> dev_err(sba->dev, "message error %d", ret);
>>> return ret;
>>> }
>>> .....
>>> }
>>>
>>> Here you _do_ assume that as soon as the mbox_send_message() returns,
>>> the last_tx_done() is true. In other words, this is a case of client
>>> 'knows_txdone'.
>>>
>>> So ideally you should specify cl->knows_txdone = true during
>>> mbox_request_channel() and have ...
>>>
>>> sba_send_mbox_request(...)
>>> {
>>> ret = mbox_send_message(sba->mchans[mchans_idx], &req->msg);
>>> if (ret < 0) {
>>> dev_err(sba->dev, "send message failed with error %d", ret);
>>> return ret;
>>> }
>>>
>>> ret = req->msg.error;
>>>
>>> /* Message successfully placed in the ringbuffer, i.e, done */
>>> mbox_client_txdone(sba->mchans[mchans_idx], ret);
>>>
>>> if (ret < 0) {
>>> dev_err(sba->dev, "message error %d", ret);
>>> return ret;
>>> }
>>>
>>> .....
>>> }
>>>
>>
>> I think we need to improve mailbox.c so that
>> mbox_chan_txdone() can be called from
>> send_data() callback.
>>
> No please. Other clients call mbox_send_message() followed by
> mbox_client_txdone(), and they are right. For example,
> drivers/firmware/tegra/bpmp.c

OK so I got confused between mbox_chan_txdone() and
mbox_client_txdone().

We should do mbox_client_txdone() from mailbox client
when mbox_chan txmethod is ACK.

Regards,
Anup