Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate

From: Maxime Ripard
Date: Tue Jan 26 2016 - 16:25:45 EST

Next message: Laura Abbott: "Re: [RFC][PATCH 1/4] arm64: Drop alloc function from create_mapping"
Previous message: Eric Dumazet: "Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jan 18, 2016 at 10:40:59AM +0100, Marcus Weseloh wrote:
> Hi,
>
> 2016-01-17 19:51 GMT+01:00 Maxime Ripard <maxime.ripard@xxxxxxxxxxxxxxxxxx>:
> > On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:
> >> >> >> - /* Ensure that we have a parent clock fast enough */
> >> >> >> + /*
> >> >> >> + * Ensure that the parent clock is set to twice the max speed
> >> >> >> + * of the spi device (possibly rounded up by the clk driver)
> >> >> >> + */
> >> >> >> mclk_rate = clk_get_rate(sspi->mclk);
> >> >> >> - if (mclk_rate < (2 * tfr->speed_hz)) {
> >> >> >> - clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> >> >> >> + if (spi->max_speed_hz != sspi->cur_max_speed ||
> >> >> >> + mclk_rate != sspi->cur_mclk) {
> >> >> >
> >> >> > Do you need to cache the values? As far as I'm aware, you end up doing
> >> >> > the computation all the time anyway.
> >> >>
> >> >> By caching the values we optimize the case when a single SPI slave
> >> >> device (or even multiple slave devices with the same max_speed) are
> >> >> used multiple times in a row. In that case, we're saving two calls:
> >> >> clk_set_rate and clk_get_rate. I was unsure about how expensive the
> >> >> clk_* calls were, so I thought it would be safer use caching. But
> >> >> maybe it's not worth the extra code?
> >> >
> >> > Unless you can point that there's a significant performance
> >> > difference, I'm not sure it's worth it.
> >>
> >> I did actually notice a significant transfer latency when a new mod0
> >> clock frequency is set, probably due to the __delay in
> >> drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
> >> caching is worth it... at least for the case when there are two slave
> >> devices with different transfer speeds accessing the same SPI module.
> >
> > I'm not sure we even need that delay in the first place, at least not
> > for all the clocks using factor.
> >
> > AFAIK, the only case where it's useful is for PLL, and all of them
> > have a bit you can busy-wait on that will let you know when the clock
> > has stabilized.
>
> OK, I didn't know that.

Look for the lock bit in the PLLs section in the datasheet

> Does that mean you would like me to drop the caching from this patch
> and prepare a patch to remove the __delay from clk-factors?

Yeah, you could add a lock bit field to the clk_factors structure,
busy-wait on it. Otherwise, just go on (hence removing the delay for
those).

Be aware that it might conflict with the clk-factors reworking Chen-Yu
posted yesterday, so you might want to synchronise with him.

> >> >> [...]
> >> >> >> - div = mclk_rate / (2 * tfr->speed_hz);
> >> >> >> - if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> >> >> >> - if (div > 0)
> >> >> >> - div--;
> >> >> >> -
> >> >> >> + div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> >> >> >
> >> >> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
> >> >>
> >> >> It is quite often, but not in all cases. The plain division rounds to
> >> >> the nearest integer, so it rounds down sometimes. Consider the
> >> >> following case: we have a slow SPI device with a spi-max-frequency of
> >> >> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> >> >> sets mclk to 214,285Hz.
> >> >>
> >> >> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
> >> >> 1 as the old code subtracts 1 two lines further down
> >> >> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000) =
> >> >> 3, so div = 2 if we add the -1
> >> >
> >> > Except that you have that extra - 1 after your DIV_ROUND_UP
> >> > calculation in the line you add. so you have to remove 1 from that
> >> > line above, and then 1 again when we set the register, which ends up
> >> > being the exact same thing, or am I missing something?
> >>
> >> The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
> >> the register. There's no need for another -1 after that (and there
> >> isn't one in the code).
> >
> > I was probably hallucinating :) My bad.
> >
> > That being said, I still have a hard time figuring out what would not
> > be solved simply by removing the div--, which seems to match what your
> > commit log says.
>
> I'm probably not doing a good job explaining the change. Let me try it
> with a few examples. Consider the following three approaches:
>
> A (original, unpatched version):
> ========================
> div = mclk_rate / (2 * tfr->speed_hz);
> if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> if (div > 0)
> div--;
> } else {
> ...
> }
>
> B (original version, but with div-- removed):
> =================================
> div = mclk_rate / (2 * tfr->speed_hz);
> if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> ...
> } else {
> ...
> }
>
> C (version after this patch):
> =====================
> div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> if (div <= SUN4I_CLK_CTL_CDR2_MASK) {

Ah yes, you're removing the +1 from here too. Sorry for missing
that obvious change....

> ...
> } else {
> ...
> }
>
> And now the following values for mclk, tfr->speed and the resulting
> div and SPI_CLK
>
> tfr->speed_hz = 50,000
> mclk = 214,285
> A: div = 1, SPI_CLK = 53,571(!)
> B: div = 2, SPI_CLK = 35,714
> C: div = 2, SPI_CLK = 35,714
>
> tfr->speed_hz = 50,000
> mclk = 200,000
> A: div = 1, SPI_CLK = 50,000
> B: div = 2, SPI_CLK = 33,333(!)
> C: div = 1, SPI_CLK = 50,000
>
> tfr->speed_hz = 650,000
> mclk = 1,6000,000
> A: div = 11, SPI_CLK = 666,667(!)
> B: div = 12, SPI_CLK = 615,385
> C: div = 12, SPI_CLK = 615,385
>
> tfr->speed_hz = 1,000,000
> mclk = 2,000,000
> A: div = 0, SPI_CLK = 1,000,000
> B: div = 1, SPI_CLK = 500,000(!)
> C: div = 0, SPI_CLK = 1,000,000
>
> I hope that makes it a little bit easier to understand the change. Of
> course, the change only makes sense if you agree that the acutal SPI
> transfer speed should never exceed the requested speed. Which I think
> is the right approach... but maybe you think otherwise?

No, my bad for being so picky about it, while missing the
point... Thanks for your awesome explanation :)

Maxime

--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

Attachment: signature.asc
Description: Digital signature

Next message: Laura Abbott: "Re: [RFC][PATCH 1/4] arm64: Drop alloc function from create_mapping"
Previous message: Eric Dumazet: "Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]