Re: [PATCH v13 4/5] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches

From: Maulik Shah
Date: Tue Mar 10 2020 - 07:19:41 EST



On 3/10/2020 5:13 AM, Doug Anderson wrote:
> Hi,
>
> On Mon, Mar 9, 2020 at 2:31 AM Maulik Shah <mkshah@xxxxxxxxxxxxxx> wrote:
>> Add changes to invoke rpmh flush() from within cache_lock when the data in
>> cache is dirty.
>>
>> Introduce two new APIs for this. Clients can use rpmh_start_transaction()
>> before any rpmh transaction once done invoke rpmh_end_transaction() which
>> internally invokes rpmh_flush() if the caches has become dirty.
>>
>> Add support to control this with flush_dirty flag.
>>
>> Signed-off-by: Maulik Shah <mkshah@xxxxxxxxxxxxxx>
>> Reviewed-by: Srinivas Rao L <lsrao@xxxxxxxxxxxxxx>
>> ---
>> drivers/soc/qcom/rpmh-internal.h | 4 +++
>> drivers/soc/qcom/rpmh-rsc.c | 6 +++-
>> drivers/soc/qcom/rpmh.c | 64 ++++++++++++++++++++++++++++++++--------
>> include/soc/qcom/rpmh.h | 10 +++++++
>> 4 files changed, 71 insertions(+), 13 deletions(-)
> As mentioned previously but not addressed [3], I believe your series
> breaks things if there are zero ACTIVE TCSs and you're using the
> immediate-flush solution. Specifically any attempt to set something's
> "active" state will clobber the sleep/wake. I believe this is hard to
> fix, especially if you want rpmh_write_async() to work properly and
> need to be robust to the last man going down while rpmh_write_async()
> is running but hasn't finished. My suggestion was to consider it to
> be an error at probe time for now.
>
> Actually, though, I'd be super surprised if the "active == 0" case
> works anyway. Aside from subtle problems of not handling -EAGAIN (see
> another previous message that you didn't respond to [2]), I think
> you'll also get failures because you never enable interrupts in
> RSC_DRV_IRQ_ENABLE for anything other than the ACTIVE_TCS. Thus
> you'll never get interrupts saying when your transactions on the
> borrowed "wake" TCS finish.

No, it shouldnât effect even with "non-OSI-mode + 0 ACTIVE TCS"

i just replied on v9, pasting same on v13 as well.

After taking your suggestion to do rpmh start/end transaction() in v13, rpmh_end_transaction()
invokes rpmh_flush() only for the last client and by this time expecting all of rpmh_write()
and rpmh_write_batch() will be already âfinishedâ as client first waits for them to finish
and then only invokes end.

So driver is good to handle rpmh_write() and rpmh_write_batch() calls.

Regarding rpmh_write_async() call, which is a fire-n-forget request from SW and client driver
may immediately invoke rpmh_end_transaction() after this.

this case is also handled properlyâ
Lets again take an example for understanding this..

1.ÂÂÂ Client invokes rpmh_write_async() to send ACTIVE cmds for targets which has zero ACTIVE TCS

ÂÂÂ Rpmh driver Re-purposes one of SLEEP/WAKE TCS to use as ACTIVE, internally this also sets
ÂÂÂ drv->tcs_in_use to true for respective SLEEP/WAKE TCS.

2.ÂÂÂ Client now without waiting for above to finish, goes ahead and invokes rpmh_end_transaction()
ÂÂÂ which calls rpmh_flush() (in case cache become dirty)

ÂÂÂ Now if re-purposed TCS is still in use in HW (transaction in progress), we still have
ÂÂÂ drv->tcs_in_use set. So the rpmh_rsc_invalidate() (invoked from rpmh_flush()) will keep on
ÂÂÂ returning -EAGAIN until that TCS becomes free to use and then goes ahead to finish its job.ÂÂ


...i will add "suggested-by" you in next revision.


> Speaking of previous emails that you didn't respond to, I think you
> still have these action items:
>
> * Document that rpmh_write(active) and rpmh_write_async(active) also
> updates wake state. [1]
I will update in v14.
>
> * Change is_req_valid() to still return true if (sleep == wake), or
> keep track of "active" and return true if (sleep != wake || wake !=
> active). [1]
Not required, as replied in v10 now only.
> * Document that for batch a write to active doesn't update wake. [1]
I will update in v14.
>
>> diff --git a/drivers/soc/qcom/rpmh-internal.h b/drivers/soc/qcom/rpmh-internal.h
>> index 6eec32b..d36be3d 100644
>> --- a/drivers/soc/qcom/rpmh-internal.h
>> +++ b/drivers/soc/qcom/rpmh-internal.h
>> @@ -70,13 +70,17 @@ struct rpmh_request {
>> *
>> * @cache: the list of cached requests
>> * @cache_lock: synchronize access to the cache data
>> + * @active_clients: count of rpmh transaction in progress
>> * @dirty: was the cache updated since flush
>> + * @flush_dirty: if the dirty cache need immediate flush
>> * @batch_cache: Cache sleep and wake requests sent as batch
>> */
>> struct rpmh_ctrlr {
>> struct list_head cache;
>> spinlock_t cache_lock;
>> + u32 active_clients;
>> bool dirty;
>> + bool flush_dirty;
>> struct list_head batch_cache;
>> };
>>
>> diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
>> index e278fc1..b6391e1 100644
>> --- a/drivers/soc/qcom/rpmh-rsc.c
>> +++ b/drivers/soc/qcom/rpmh-rsc.c
>> @@ -61,6 +61,8 @@
>> #define CMD_STATUS_ISSUED BIT(8)
>> #define CMD_STATUS_COMPL BIT(16)
>>
>> +#define FLUSH_DIRTY 1
>> +
>> static u32 read_tcs_reg(struct rsc_drv *drv, int reg, int tcs_id, int cmd_id)
>> {
>> return readl_relaxed(drv->tcs_base + reg + RSC_DRV_TCS_OFFSET * tcs_id +
>> @@ -670,13 +672,15 @@ static int rpmh_rsc_probe(struct platform_device *pdev)
>> INIT_LIST_HEAD(&drv->client.cache);
>> INIT_LIST_HEAD(&drv->client.batch_cache);
>>
>> + drv->client.flush_dirty = device_get_match_data(&pdev->dev);
>> +
>> dev_set_drvdata(&pdev->dev, drv);
>>
>> return devm_of_platform_populate(&pdev->dev);
>> }
>>
>> static const struct of_device_id rpmh_drv_match[] = {
>> - { .compatible = "qcom,rpmh-rsc", },
>> + { .compatible = "qcom,rpmh-rsc", .data = (void *)FLUSH_DIRTY },
> Ick. This is just confusing. IMO better to set
> 'drv->client.flush_dirty = true' directly in probe with a comment
> saying that it could be removed if we had OSI.
Done.
i will keep this bit earlier in probe with commet, so later if we detect rsc to be in hierarchy
from [1], we can override this to be 0 within rpmh_probe_power_domain().

[1] https://patchwork.kernel.org/patch/11391229/

>
> ...and while you're at it, why not fire off a separate patch (not in
> your series) adding the stub to 'include/linux/psci.h'. Then when we
> revisit this in a year it'll be there and it'll be super easy to set
> the value properly.

With above approch to set it in probe accordingly PSCI change won't be required.

it will be simple, cleaner and without any resistance from PSCI perspective.

>
>> { }
>> };
>>
>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c
>> index 5bed8f4..9d40209 100644
>> --- a/drivers/soc/qcom/rpmh.c
>> +++ b/drivers/soc/qcom/rpmh.c
>> @@ -297,12 +297,10 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>> {
>> struct batch_cache_req *req;
>> const struct rpmh_request *rpm_msg;
>> - unsigned long flags;
>> int ret = 0;
>> int i;
>>
>> /* Send Sleep/Wake requests to the controller, expect no response */
>> - spin_lock_irqsave(&ctrlr->cache_lock, flags);
>> list_for_each_entry(req, &ctrlr->batch_cache, list) {
>> for (i = 0; i < req->count; i++) {
>> rpm_msg = req->rpm_msgs + i;
>> @@ -312,7 +310,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr)
>> break;
>> }
>> }
>> - spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>>
>> return ret;
>> }
>> @@ -433,16 +430,63 @@ static int send_single(struct rpmh_ctrlr *ctrlr, enum rpmh_state state,
>> }
>>
>> /**
>> + * rpmh_start_transaction: Indicates start of rpmh transactions, this
>> + * must be ended by invoking rpmh_end_transaction().
>> + *
>> + * @dev: the device making the request
>> + */
>> +void rpmh_start_transaction(const struct device *dev)
>> +{
>> + struct rpmh_ctrlr *ctrlr = get_rpmh_ctrlr(dev);
>> + unsigned long flags;
>> +
>> + if (!ctrlr->flush_dirty)
>> + return;
>> +
>> + spin_lock_irqsave(&ctrlr->cache_lock, flags);
>> + ctrlr->active_clients++;
> Wouldn't hurt to have something like:
>
> /*
> * Detect likely leak; we shouldn't have 1000
> * people making in-flight changes at the same time.
> */
> WARN_ON(ctrlr->active_clients > 1000)
Not necessary change.
>
>
>> + spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +}
>> +EXPORT_SYMBOL(rpmh_start_transaction);
>> +
>> +/**
>> + * rpmh_end_transaction: Indicates end of rpmh transactions. All dirty data
>> + * in cache can be flushed immediately when ctrlr->flush_dirty is set
>> + *
>> + * @dev: the device making the request
>> + *
>> + * Return: 0 on success, error number otherwise.
>> + */
>> +int rpmh_end_transaction(const struct device *dev)
>> +{
>> + struct rpmh_ctrlr *ctrlr = get_rpmh_ctrlr(dev);
>> + unsigned long flags;
>> + int ret = 0;
>> +
>> + if (!ctrlr->flush_dirty)
>> + return ret;
>> +
>> + spin_lock_irqsave(&ctrlr->cache_lock, flags);
> WARN_ON(!active_clients);
Why? when active_clients become zero, we want to finally call rpmh_flush(), i don't see a reason to warn and then flush.

Or do you want to make a check if client really called rpmh_start_transaction() first before calling rpmh_end_transaction() then when we do
ctrlr->active_clients--;

it shouldn't go to negative value at the end. in that case let me know, i will make those changes.

>
>
>> +
>> + ctrlr->active_clients--;
>> + if (ctrlr->dirty && !ctrlr->active_clients)
>> + ret = rpmh_flush(ctrlr);
> As mentioned previously [2], I don't think it's valid to call
> rpmh_flush() with interrupts disabled. Specifically (as of your
> previous patch) rpmh_flush now loops if rpmh_rsc_invalidate() returns
> -EAGAIN. I believe that the caller needs to enable interrupts for a
> little bit before trying again. If the caller doesn't need to enable
> interrupts for a little bit before trying again then why was -EAGAIN
> even returned? tcs_invalidate() could have just looped itself and all
> the code would be much simpler.
I will check and address this.
>
>
>> +
>> + spin_unlock_irqrestore(&ctrlr->cache_lock, flags);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL(rpmh_end_transaction);
>> +
>> +/**
>> * rpmh_flush: Flushes the buffered active and sleep sets to TCS
>> *
>> * @ctrlr: controller making request to flush cached data
>> *
>> - * Return: -EBUSY if the controller is busy, probably waiting on a response
>> - * to a RPMH request sent earlier.
>> + * Return: 0 on success, error number otherwise.
>> *
>> - * This function is always called from the sleep code from the last CPU
>> - * that is powering down the entire system. Since no other RPMH API would be
>> - * executing at this time, it is safe to run lockless.
>> + * This function can either be called from sleep code on the last CPU
>> + * (thus no spinlock needed) or with the ctrlr->cache_lock already held.
>> */
>> int rpmh_flush(struct rpmh_ctrlr *ctrlr)
>> {
>> @@ -464,10 +508,6 @@ int rpmh_flush(struct rpmh_ctrlr *ctrlr)
>> if (ret)
>> return ret;
>>
>> - /*
>> - * Nobody else should be calling this function other than system PM,
>> - * hence we can run without locks.
>> - */
>> list_for_each_entry(p, &ctrlr->cache, list) {
>> if (!is_req_valid(p)) {
>> pr_debug("%s: skipping RPMH req: a:%#x s:%#x w:%#x",
>> diff --git a/include/soc/qcom/rpmh.h b/include/soc/qcom/rpmh.h
>> index f9ec353..85e1ab2 100644
>> --- a/include/soc/qcom/rpmh.h
>> +++ b/include/soc/qcom/rpmh.h
>> @@ -22,6 +22,10 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state,
>>
>> int rpmh_invalidate(const struct device *dev);
>>
>> +void rpmh_start_transaction(const struct device *dev);
>> +
>> +int rpmh_end_transaction(const struct device *dev);
>> +
>> #else
>>
>> static inline int rpmh_write(const struct device *dev, enum rpmh_state state,
>> @@ -41,6 +45,12 @@ static inline int rpmh_write_batch(const struct device *dev,
>> static inline int rpmh_invalidate(const struct device *dev)
>> { return -ENODEV; }
>>
>> +void rpmh_start_transaction(const struct device *dev)
>> +{ return -ENODEV; }
> Unexpected return from void function.
>
Thanks, done.
>> +
>> +int rpmh_end_transaction(const struct device *dev)
>> +{ return -ENODEV; }
>> +
>> #endif /* CONFIG_QCOM_RPMH */
>>
>> #endif /* __SOC_QCOM_RPMH_H__ */
> [1] https://lore.kernel.org/r/CAD=FV=VzNnRdDN5uPYskJ6kQHq2bAi2ysEqt0=taagdd_qZb-g@xxxxxxxxxxxxxx
> [2] https://lore.kernel.org/r/CAD=FV=UYpO2rSOoF-OdZd3jKfSZGKnpQJPoiE5fzH+u1uafS6g@xxxxxxxxxxxxxx
> [3] https://lore.kernel.org/r/CAD=FV=VNaqwiti+UB8fLgjF5r2CD2xeF_p7qHS-_yXqf+ZDrBg@xxxxxxxxxxxxxx
>
>
>
> -Doug

Thanks,

Maulik

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation