Re: [PATCH v4 05/14] soc: mediatek: mtk-svs: use svs clk control APIs

From: Roger Lu (陸瑞傑)
Date: Sat Feb 11 2023 - 06:34:17 EST


Hi Matthias Sir,

On Mon, 2023-02-06 at 13:09 +0100, Matthias Brugger wrote:
>
> On 06/02/2023 03:01, Roger Lu (陸瑞傑) wrote:
> > Hi Matthias Sir,
> >
> >
> > On Thu, 2023-02-02 at 11:29 +0100, Matthias Brugger wrote:
> > > 你好,
> >
> > I got shock and thought someone used your name to reply. However,
> > your email account helps me clear my mind. Haha.. Nice and warm to see
> > mandarin
> > on patchwork. It's so fresh and exciting :-).
> >
>
> 谢谢。 I'm learning mainland Chinese for a few month now, I also learned that
> you
> use different symbols in Taiwan, which I don't know. 对不起。

Ha. Both symbols are welcome to me. :-)

>
> > >
> > > On 01/02/2023 13:28, Roger Lu (陸瑞傑) wrote:
> > > > Hi Matthias Sir,
> > > >
> > > > On Tue, 2023-01-31 at 14:19 +0100, Matthias Brugger wrote:
> > > > >
> > > > > On 11/01/2023 08:45, Roger Lu wrote:
> > > > > > In MediaTek HW design, svs and thermal both use the same clk source.
> > > > > > It means that svs clk reference count from CCF includes thermal
> > > > > > control
> > > > > > counts. That makes svs driver confuse whether it disabled svs's main
> > > > > > clk
> > > > > > or not from CCF's perspective and lead to turn off their shared clk
> > > > > > unexpectedly. Therefore, we add svs clk control APIs to make sure
> > > > > > svs's
> > > > > > main clk is controlled well by svs driver itself.
> > > > > >
> > > > > > Here is a NG example. Rely on CCF's reference count and cause
> > > > > > problem.
> > > > > >
> > > > > > thermal probe (clk ref = 1)
> > > > > > -> svs probe (clk ref = 2)
> > > > > > -> svs suspend (clk ref = 1)
> > > > > > -> thermal suspend (clk ref = 0)
> > > > > > -> thermal resume (clk ref = 1)
> > > > > > -> svs resume (encounter error, clk ref = 1)
> > > > > > -> svs suspend (clk ref = 0)
> > > > > > -> thermal suspend (Fail here, thermal HW control w/o clk)
> > > > > >
> > > > > > Fixes: a825d72f74a3 ("soc: mediatek: fix missing
> > > > > > clk_disable_unprepare()
> > > > > > on
> > > > > > err in svs_resume()")
> > > > > > Signed-off-by: Roger Lu <roger.lu@xxxxxxxxxxxx>
> > > > >
> > > > > That looks wrong. Although I don't out of my mind, there should be a
> > > > > way
> > > > > to
> > > > > tell
> > > > > the clock framework that this clock is shared between several devices.
> > > > >
> > > > > I wonder if using clk_enable and clk_disable in svs_resume/suspend
> > > > > wouldn't
> > > > > be
> > > > > enough.
> > > >
> > > > Oh yes, Common Clock Framework (CCF) knows the clock shared between
> > > > several
> > > > devices and maintains clock "on/off" by reference count.
> > > >
> > >
> > > The thing is if you use clk_prepare_enable then the clock framework
> > > check's
> > > if
> > > the clock is already prepared, which could happen like you described in
> > > the
> > > svs_resume (encount error) case in the commit message. The question is,
> > > can't
> > > we
> > > just use clk_enable and clk_disable in resume/suspend and only prepare the
> > > clock
> > > in the probe function?
> >
> > We'll think if this can fix the problem. Thanks for the advice very much.
> >
> > >
> > > > We concern how to stop running svs_suspend() when svs clk is already
> > > > disabled by
> > > > svs_resume(). Take an example as below, if we refers to
> > > > __clk_is_enabled()
> > > > result for knowing svs clk status, it will return "true" all the time
> > > > because
> > > > thermal clk is still on. This causes the problem mentioned in commit
> > > > message.
> > > >
> > >
> > > I would expect that the kernel takes care that we can't enter a resume
> > > path
> > > for
> > > a device before the suspend path has finished. Honestly I don't really
> > > understand the problem here. It seems something different then what you
> > > described in the commit message.
> > >
> > > Please help me understand better.
> >
> > I see. This patch title needs to be changed to "avoid turning off svs clk
> > twice
> > unexpectedly" for pointing out the problem precisely. We saw a loophole that
> > svs
> > clk might be turned off in svs_resume() first and in svs_suspend() again
> > without
> > enabling svs clk during these the process. Therefore, we try to fix it by
> > this
> > patch. Below is our thinking process to explain how we got here.
> >
> > 1. (abandoned) We add __clk_is_enabled() check in svs_suspend() to prevent
> > svs
> > clk from being turned off twice when svs_resume() turned off svs clk in the
> > error-handling process. Nonetheless, we met the NG case in the commit
> > message.
> > 2. (current patch) We add svs clk control hint to understand if we need to
> > run
> > svs_suspend() or not if svs_resume() turned off svs clk before.
> >
>
> Did you had a look on the dev_pm_ops? Maybe we can use suspend_late,
> resume_early to make sure there is no race condition. I wonder also if we
> can't
> make sure that this does not happen using device links. Sorry, I can't give
> better guidance on how to use this technologies, but I have the feeling we
> can
> fix this with existing infrastructure.

No, we didn't look on dev_pm_ops and it seems like another way to fix this issue
with __clk_is_enabled() check in svs_suspend(). Thanks for the advice again.
we'll keep looking for any possible answers to this issue.

Sincerely,
Roger Lu.