Re: [PATCH v5 3/4] pwm: add microchip soft ip corePWM driver

From: Conor.Dooley
Date: Mon Jul 11 2022 - 10:33:53 EST


Hey Uwe, took another look at this today.
I'll give you some more info on the sync_update hw when I send
the next version.

On 09/07/2022 17:02, Uwe Kleine-König wrote:
> Hello Conor,
>
> On Fri, Jul 08, 2022 at 03:39:22PM +0100, Conor Dooley wrote:
>> Add a driver that supports the Microchip FPGA "soft" PWM IP core.
>>
>> Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>> ---

>> +static int mchp_core_pwm_apply_period(struct pwm_chip *chip, const struct pwm_state *state,
>> + u8 *prescale, u8 *period_steps)
>> +{
>> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>> + u64 tmp, clk_rate;
>> + u16 prescale_val, period_steps_val;
>> +
>> + /*
>> + * Calculate the period cycles and prescale values.
>> + * The registers are each 8 bits wide & multiplied to compute the period
>> + * using the formula:
>> + * (clock_period) * (prescale + 1) * (period_steps + 1)
>> + * so the maximum period that can be generated is 0x10000 times the
>> + * period of the input clock.
>> + * However, due to the design of the "hardware", it is not possible to
>> + * attain a 100% duty cycle if the full range of period_steps is used.
>> + * Therefore period_steps is restricted to 0xFE and the maximum multiple
>> + * of the clock period attainable is 0xFF00.
>> + */
>> + clk_rate = clk_get_rate(mchp_core_pwm->clk);
>
> + /* If clk_rate is too big, the following multiplication might overflow */


I expanded this comment slightly to:
/*
* If clk_rate is too big, the following multiplication might overflow.
* However this is implausible, as the fabric of current FPGAs cannot
* provide clocks at a rate high enough.
*/

>> + if (clk_rate >= NSEC_PER_SEC)
>> + return -EINVAL;
>> +
>> + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
>> +
>> + if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
>> + *prescale = MCHPCOREPWM_PRESCALE_MAX - 1;
>> + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1;
>> + goto write_registers;
>> + }
>> +
>> + for (prescale_val = 1; prescale_val <= MCHPCOREPWM_PRESCALE_MAX; prescale_val++) {
>> + period_steps_val = div_u64(tmp, prescale_val);
>> + if (period_steps_val > MCHPCOREPWM_PERIOD_STEPS_MAX)
>> + continue;
>> + *period_steps = period_steps_val - 1;
>> + *prescale = prescale_val - 1;
>> + break;
>> + }
>
> OK, so you want to find the smallest prescale_val such that
>
> prescale_val * MCHPCOREPWM_PERIOD_STEPS_MAX >= tmp
>
> . You can calculate that without a loop as:
>
> prescale_val = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);

Hmm, not quite right. For tmp < MCHPCOREPWM_PERIOD_STEPS_MAX this gives
zero. It would have to be:
*prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);

In which case, the loop collapses to:

*prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);
/* PREG_TO_VAL() can produce a value larger than UINT8_MAX */
*period_steps = div_u64(tmp, PREG_TO_VAL((u32) *prescale)) - 1;

and the interim _val variables can just be deleted.


>> +static int mchp_core_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
>> + const struct pwm_state *state)
>> +{
>> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>> + struct pwm_state current_state = pwm->state;
>> + bool period_locked;
>> + u64 period;
>> + u16 channel_enabled;
>> + u8 prescale, period_steps;
>> + int ret;
>> +
>> + if (!state->enabled) {
>> + mchp_core_pwm_enable(chip, pwm, false);
>> + return 0;
>> + }
>> +
>> + /*
>> + * If the only thing that has changed is the duty cycle or the polarity,
>> + * we can shortcut the calculations and just compute/apply the new duty
>> + * cycle pos & neg edges
>> + * As all the channels share the same period, do not allow it to be
>> + * changed if any other channels are enabled.
>> + */
>> + spin_lock(&mchp_core_pwm->lock);
>> +
>> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
>> + period_locked = channel_enabled & ~(1 << pwm->hwpwm);
>> +
>> + if ((!current_state.enabled || current_state.period != state->period) && !period_locked) {
>> + ret = mchp_core_pwm_apply_period(chip, state, &prescale, &period_steps);
>> + if (ret) {
>> + spin_unlock(&mchp_core_pwm->lock);
>> + return ret;
>> + }
>> + } else {
>> + prescale = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE);
>> + period_steps = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PERIOD);
>> + }
>> +
>> + /*
>> + * If the period is locked, it may not be possible to use a period less
>> + * than that requested.
>> + */
>> + period = PREG_TO_VAL(period_steps) * PREG_TO_VAL(prescale) * NSEC_PER_SEC;
>
> s/ / /
>
>> + do_div(period, clk_get_rate(mchp_core_pwm->clk));
>> + if (period > state->period) {
>> + spin_unlock(&mchp_core_pwm->lock);
>> + return -EINVAL;
>> + }
>
> I would consider it easier to do the following (in pseudo syntax):
>
>
> prescale, period_steps = calculate_hwperiod(period);
>
> if (period_locked):
> hwprescale = readb_relaxed(PRESCALE)
> hwperiod_steps = readb_relaxed(PERIOD)
>
> if period_steps * prescale < hwperiod_steps * hwprescale:
> return -EINVAL
> else
> prescale, period_steps = hwprescale,
> hwperiod_steps

I think I like this method more than messing around with the clks.
I'll change to something like this for the next version.

> duty_steps = calculate_hwduty(duty, prescale)
> if (duty_steps > period_steps)
> duty_steps = period_steps


>
>> + mchp_core_pwm_apply_duty(chip, pwm, state, prescale, period_steps);
>> +
>> + /*
>> + * Notify the block to update the waveform from the shadow registers.
>> + * The updated values will not appear on the bus until they have been
>> + * applied to the waveform at the beginning of the next period. We must
>> + * write these registers and wait for them to be applied before calling
>> + * enable().
>> + */
>> + if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm)) {
>> + writel_relaxed(1U, mchp_core_pwm->base + MCHPCOREPWM_SYNC_UPD);
>> + usleep_range(state->period, state->period * 2);
>> + }
>> +
>> + spin_unlock(&mchp_core_pwm->lock);
>> +
>> + mchp_core_pwm_enable(chip, pwm, true);
>
> I already asked in the last round: Do you really need to write the
> SYNC_UPD register twice? I would expect that you don't?!
>
> Also the locking looks fishy here. It would be simpler (and maybe even
> more robust, didn't think deeply about it) to assume in
> mchp_core_pwm_enable() that the caller holds the lock. Then you only
> grab the lock once during .apply() and nothing strange can happen in the
> gap.

I got it into my head that enable() could be called by the framework.
I'll simplify the locking here.

> I'd take the lock here to be sure to get a consistent return value.
>
>> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
>
> micro optimisation: You're reading two register values here, but only use
> one. Shadowing the enabled registers in mchp_core_pwm might also be an
> idea.

I'd agree, but more from the perspective of how awful I feel this code
looks.

>
>> + if (channel_enabled & 1 << pwm->hwpwm)
>
> I'm always unsure about the associativity of & and <<, so I would have
> written that as
>
> if (channel_enabled & (1 << pwm->hwpwm))
>
> I just tested that for the umpteens time and your code is fine, so this
> is only for human readers like me.

I'll change it, I'll prob have forgotten the associativity by the time I
look at the driver next.

>
>> + state->enabled = true;
>> + else
>> + state->enabled = false;
>> +
>> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));
>> +
>> + posedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm));
>> + negedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm));
>> +
>> + duty_steps = abs((s16)posedge - (s16)negedge);
>
> If duty_steps == 0 the returned result is wrong. I suggest to fix that,
> at least mention the problem in a comment.

I think handling it is the way to go.

>
>> + state->duty_cycle = duty_steps * prescale * NSEC_PER_SEC;
>
> Can this overflow?

No, 255 * 256 * 1E9 < 2^64 but ...

>> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));

... can.

>>> What is the typical return value of clk_get_rate(mchp_core_pwm->clk)?
>>
>> It's gonna be less than 600M
>
> An exact value would be interesting, then when I spot a rounding problem
> I could give you a test case to double check.

The maximum depends on speed grade, but no more than 200 MHz.


>>>> You need to round up here. Did you test your driver with PWM_DEBUG on?
>>>> During test please do for a few fixed periods:
>>>>
>>>> for duty_cycle in [0 .. period]:
>>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>
>>>> for duty_cycle in [period .. 0]:
>>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>
>>>> and check there is no output claiming a miscalculation.
>>>
>>> I ran the stuff you gave me last time, doing something similar w/ a
>>> shell loop. Got no reported miscalculations.
>>
>> I'm surprise, I would have expected that my test recipe would find such
>> an issue. Could you follow my arguing about the rounding direction?
>> There always the possibility that I'm wrong, too.
>
> I'll take another look & get back to you.

I *think* I might've spelt "PWM_DEBUG" as "PWM_DEBUF"...

I'll retest!

Thanks,
Conor.