Re: [PATCH v4 4/5] phy: ti: Add a new SERDES driver for TI's AM654x SoC

From: Peter Rosin
Date: Tue Jun 18 2019 - 03:26:02 EST


Hi,

Sorry for the late reply, $-work interfered...

On 2019-06-13 12:21, Kishon Vijay Abraham I wrote:
> Hi,
>
> On 13/06/19 1:22 PM, Peter Rosin wrote:
>> Hi,
>>
>> On 2019-06-13 06:57, Kishon Vijay Abraham I wrote:
>>> Hi Peter,
>>>
>>> On 13/06/19 4:20 AM, Peter Rosin wrote:
>>>> Hi!
>>>>
>>>> [I know this has already been merged upstream, but I only just
>>>> now noticed the code and went to the archives to find the
>>>> originating mail. I hope I managed to set in-reply-to correctly...]
>>>>
>>>> The mux handling is problematic and does not follow the rules.
>>>> It needs to be fixed, or you may face deadlocks. See below.
>>>>
>>>> On 2019-04-05 11:08, Kishon Vijay Abraham I wrote:
>>>>> Add a new SERDES driver for TI's AM654x SoC which configures
>>>>> the SERDES only for PCIe. Support fo USB3 will be added later.
>>>>>
>>>>> SERDES in am654x has three input clocks (left input, externel reference
>>>>> clock and right input) and two output clocks (left output and right
>>>>> output) in addition to a PLL mux clock which the SERDES uses for Clock
>>>>> Multiplier Unit (CMU refclock).
>>>>>
>>>>> The PLL mux clock can select from one of the three input clocks.
>>>>> The right output can select between left input and external reference
>>>>> clock while the left output can select between the right input and
>>>>> external reference clock.
>>>>>
>>>>> The driver has support to select PLL mux and left/right output mux as
>>>>> specified in device tree.
>>>>>
>>>>> [rogerq@xxxxxx: Fix boot lockup caused by accessing a structure member
>>>>> (hw->init) allocated in stack of probe() and accessed in get_parent]
>>>>> [rogerq@xxxxxx: Fix "Failed to find the parent" warnings]
>>>>> Signed-off-by: Roger Quadros <rogerq@xxxxxx>
>>>>> Signed-off-by: Kishon Vijay Abraham I <kishon@xxxxxx>
>>
>> *snip*
>>
>>>>> +static void serdes_am654_release(struct phy *x)
>>>>> +{
>>>>> + struct serdes_am654 *phy = phy_get_drvdata(x);
>>>>> +
>>>>> + phy->type = PHY_NONE;
>>>>> + phy->busy = false;
>>>>> + mux_control_deselect(phy->control);
>>>>
>>>> Here you unconditionally deselect the mux, and that seems
>>>> dangerous. Are you *sure* that ->release may not be called
>>>> without a successful xlate call?
>>>
>>> Yeah, without a successful xlate(), the consumer will never get a reference to
>>> the PHY and the ->release() is invoked only from phy_put() which needs a
>>> reference to the PHY.
>>
>> Yes, I thought it might be ok, but good that you can confirm it.
>>
>>>> I'm not 100% sure of that, but I have not looked at the phy
>>>> code before today, so it may very well be the case that this
>>>> is safe...
>>>>
>>>>> +}
>>>>> +
>>>>> +struct phy *serdes_am654_xlate(struct device *dev, struct of_phandle_args
>>>>> + *args)
>>>>> +{
>>>>> + struct serdes_am654 *am654_phy;
>>>>> + struct phy *phy;
>>>>> + int ret;
>>>>> +
>>>>> + phy = of_phy_simple_xlate(dev, args);
>>>>> + if (IS_ERR(phy))
>>>>> + return phy;
>>>>> +
>>>>> + am654_phy = phy_get_drvdata(phy);
>>>>> + if (am654_phy->busy)
>>>>> + return ERR_PTR(-EBUSY);
>>>>> +
>>>>> + ret = mux_control_select(am654_phy->control, args->args[1]);
>>>>> + if (ret) {
>>>>> + dev_err(dev, "Failed to select SERDES Lane Function\n");
>>>>> + return ERR_PTR(ret);
>>>>> + }
>>>>
>>>> *However*
>>>>
>>>> Here you select the mux as the last action, good, but, a mux must
>>>> be handled with that same care as a locking primitive, i.e.
>>>> successful selects must be perfectly balanced with deselects. I
>>>> see no guarantee of that here, since there are other failures
>>>> possible after the xlate call. So, being last in the function
>>>> does not really help. If I read the code correctly, the
>>>> phy core may fail if try_module_get fails in phy_get(). If that
>>>> ever happens, a successful call to mux_control_select is simply
>>>> forgotten, and the mux will be locked indefinitely.
>>>
>>> Good catch. While adding ->release() ops which is only invoked from phy_put,
>>> perhaps this was missed. Ideally it should be invoked from other places where
>>> there is a failure after phy_get.
>>>>
>>>> am654_phy->busy will also be set indefinitely, so you will get
>>>> -EBUSY and not a hard deadlock. At least here, but if the now
>>>> locked mux control happens to also control some other muxes
>>>> (probably unlikely, but if), then their consumers will potentially
>>>> deadlock hard. But that's just after a cursory reading, so I may
>>>> completely miss something...
>>>
>>> The ->busy here should prevent two consumers trying to control the same mux.
>>
>> Aha, you do not seem to be aware that one mux controller can
>> have multiple independent consumers (a mux is not like a gpio
>> or a pwm in that aspect). What I'm talking about is a single
>> mux control that controls several parallel muxes, each with its
>> own consumer. In the specific case you are targeting, that may
>> not be possible due to some hardware reason or something, but
>> looking at just this driver *it* cannot know that the mux will
>> be available just because it has a local ->busy flag.
>
> I think this should be okay atleast for my case.
>
> My dt node will looks something like this
> serdes_mux: mux-controller {
> compatible = "mmio-mux";
> #mux-control-cells = <1>;
> mux-reg-masks = <0x4080 0x3>, /* SERDES0 lane select */
> <0x4090 0x3>; /* SERDES1 lane select */
> }
>
> Here SERDES lane0 is muxed between PCIe0 Lane0, ICSS2 SGMII Lane0.
> The consumers of SERDES0 (in this case PCIe and ICSS), will have to invoke
> ->xlate of AM654 SERDES driver to select the mux and ->busy should be able to
> tell whether it is available.

Yes, for a single mux consumer everything is fine.

> Only when multiple drivers try to get devm_mux_control_get and use
> mux_control_select, the problem might occur. For my case, only SERDES is the
> consumer of mux.

For your device tree that is true, indeed, but someone else might extend
that DT with an additional mux consumer or reuse your driver in some
unexpected manner. Arguably this person gets to keep the pieces when it
breaks. And with all likelihood, the register that controls this mux
does not control any other mux, so there is probably very little reason
for someone to extend the DT in this way.

>>
>> For this case, I get the feeling that the mux may be selected for
>> a very long time, right? It is never about selecting the mux,
>> doing something for x milli/microseconds or so and then deselecting
>> the mux. Perhaps the mux will typically sit in the same state for
>> the entire uptime of the machine?
>
> Yes right. It'll be for the entire up time. Well, with dt overlay this could
> change.
>>
>> If you have these very long access patterns, the sharing capability
>> of the mux controls are pretty much useless, and I have
>> contemplating a mux mode to support this case. I.e. where you
>> lock/unlock the mux control once at probe/release (or similar),
>> and then basically instead of a shared mux get an exclusive mux
>> where the consumer is responsible for not making parallel accesses.
>> In other words, just like a gpio or a pwm.
>
> Yes, having an exclusive mux will make sure no one can accidentally modify the mux.
>
>>
>> The problem is that I then need a definition of "long" and "short"
>> accesses, and I split the mux universe in two...
>
> Are you trying to make sure there is no deadlock when two different consumers
> try to control a shared mux?

I'm just noticing that when I added the mux framework, the big motivation
was to serialize multiple (two in my case) consumers since the HW used
the same GPIO lines to control muxes for both an I2C bus and ADC signals.
But, all other users seem to use the mux framework for more longstanding
selects and are just hindered by the locking aspect of select/deselect.
I.e., it seems that most users (you included) would be much better off
with an API that grabs the mux and is then free to just select its desired
position (or deselect it if that is beneficial for some power consumption
reason or something), without the requirement of balanced select/deselect
calls.

So, I'm basically just poking around in how the mux framework is used
and what its users need...

Cheers,
Peter