Re: [PATCH v2] serial: 8250_dw: Fix common clocks usage race condition

From: Sergey Semin
Date: Mon Mar 23 2020 - 13:07:33 EST


On Mon, Mar 23, 2020 at 01:52:25PM +0200, Andy Shevchenko wrote:
> On Mon, Mar 23, 2020 at 02:11:49PM +0300, Sergey Semin wrote:
> > On Mon, Mar 23, 2020 at 11:20:51AM +0200, Andy Shevchenko wrote:
> > > On Mon, Mar 23, 2020 at 05:46:09AM +0300, Sergey.Semin@xxxxxxxxxxxxxxxxxxxx wrote:
> > > > From: Serge Semin <Sergey.Semin@xxxxxxxxxxxxxxxxxxxx>
> > >
> > > The question to CLK framework maintainers, is it correct approach in general
> > > for this case?
> >
> > You should have been more specific then, if you wanted to see someone
> > special.
>
> I didn't get your comment here. Since you put the question under a pile of
> words in the commit message, and actually in the changelog, not even in the
> message, I repeated it clearly that clock maintainers can see it.
>
> > > On Wed, Mar 18, 2020 at 05:19:53PM +0200, Andy Shevchenko wrote:
> > >> Also it would be nice to see come clock framework guys' opinions...
> >
> > Who can give a better comments regarding the clk API if not the
> > subsystem maintainers?
>
> You already got one from Maxime.
>
> ...
>
> > > > + /*
> > > > + * Some platforms may provide a reference clock shared between several
> > > > + * devices. In this case before using the serial port first we have to
> > > > + * make sure nothing will change the rate behind our back and second
> > > > + * the tty/serial subsystem knows the actual reference clock rate of
> > > > + * the port.
> > > > + */
> > >
> > > > + if (clk_rate_exclusive_get(d->clk)) {
> > > > + dev_warn(p->dev, "Couldn't lock the clock rate\n");
> > >
> > > So, if this fails, in ->shutdown you will disbalance reference count, or did I
> > > miss something?
> > >
> >
> > Hm, you are right. I didn't fully thought this through. The thing is
> > that according to the clk_rate_exclusive_get() function code currently
> > it never fails. Though this isn't excuse for introducing a prone to future
> > bugs code.
> >
> > Anyway if according to design a function may return an error we must take
> > into account in the code using it. Due to this obligation and seeing we can't
> > easily detect whether clk_rate_exclusive_get() has been failed while the
> > driver is being executed in the shutdown method, the best approach would be
> > to just return an error in startup method in case of the clock rate exclusivity
> > acquisition failure. If you are ok with this, I'll have it fixed in v3
> > patchset.
>
> It needs to be carefully tested on other platforms than yours.
>

Alas I don't have one. But it can be done by other kernel users at rc-s stage
of the next kernel release.

> > > > + } else if (d->clk) {
> > >
> > > > + p->uartclk = clk_get_rate(d->clk);
> > > > + if (!p->uartclk) {
> > > > + clk_rate_exclusive_put(d->clk);
> > > > + dev_err(p->dev, "Clock rate not defined\n");
> > > > + return -EINVAL;
> > > > + }
> > >
> > > This operations I didn't get. If we have d->clk and suddenly get 0 as a rate
> > > (and note, that we still update uartclk member!), we try to put (why?) the
> > > exclusiveness of rate.
> > >
> >
> > Here is what I had in my mind while implementing this code. If d->clk
> > isn't NULL, then there is a "baudclk" clock handler and we can use it to
> > alter/retrieve the baud clock rate. But the same clock could be used by
> > some other driver and that driver could have changed the rate while we
> > didn't have this tty port started up (opened). In this case that driver
> > could also have the clock exclusively acquired. So instead of trying to
> > set the current p->uartclk rate to the clock, check the return value,
> > if it's an error, try to get the current clock rate, check the return
> > value, and so on, I just get the current baud clock rate and make sure
> > the value is not zero
>
> > (clk_get_rate() returns a zero rate in case of
> > internal errors).
>
> Have you considered !CLK case?
>

Yes. It's a case of optional clock. Have a look at how the clock API
works. You are already using it here in this driver when calling
clk_prepare_enable()/clk_disable_unprepare().

> > At the same time dw8250_set_termios() will try to update
> > the baud clock rate anyway (also by the serial core at the point of the port
> > startup), so we don't need such complication in the DW 8250 port startup
> > code.
> >
> > > (and note, that we still update uartclk member!),
> >
> > Yes, if we can't determine the current baud clock rate, then the there is
> > a problem with the clock device, so we don't know at what rate it's
> > currently working. Zero is the most appropriate value to be set in this case.
> >
> > > we try to put (why?) the > exclusiveness of rate.
> >
> > Yes, we put the exclusivity and return an error, because this if-branch has
> > been taken only if the exclusivity has been successfully acquired.
>
> So, this means that above code requires elaboration in the comments to explain
> how it supposed to work.
>

That's what I did by the comment: "... second the tty/serial subsystem knows
the actual reference clock rate of the port." If you think, that checking a
return value and undoing things in case of an error need elaboration in a
comment I'll do it in v3.

-Regards,
Sergey

> --
> With Best Regards,
> Andy Shevchenko
>
>