Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing

From: H. Nikolaus Schaller
Date: Mon Jun 18 2018 - 12:44:00 EST


Hi Tony,

> Am 18.06.2018 um 13:51 schrieb Tony Lindgren <tony@xxxxxxxxxxx>:
>
> * Christ van Willegen <cvwillegen@xxxxxxxxx> [180618 10:01]:
>>> --- a/drivers/pinctrl/core.c
>>> +++ b/drivers/pinctrl/core.c
>>> + if (!strcmp(function, gname))
>>
>>
>> This could never fail? gname guaranteed to never == NULL?

If there are no duplicates and stale entries any more, gname must
be valid.

>
> Thanks I'll add checks to make sure we always have a valid
> name for new functions and groups added.

Well, not having checks is the cheapest debugging tool that does not
even need to be enabled because strcmp(NULL) checks come for free :)

Jokes aside, I now have tested the latest patch and it seems to solve the issues :) :) :)

I can also demonstrate that the duplication has gone:

root@letux:~# dmesg|fgrep generic_add
[ 0.747436] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_hsusb2_pins selector=0
[ 0.750671] pinctrl_generic_add_group: pctldev=ee486f80 name=pinmux_hsusb2_2_pins selector=0
[ 0.751403] pinctrl_generic_add_group: pctldev=ee486e00 name=pinmux_mcbsp1_devconf0_pins selector=0
[ 0.752227] pinctrl_generic_add_group: pctldev=ee486d00 name=pinmux_tv_acbias_devconf1_pins selector=0
[ 0.769104] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart1_pins selector=1
[ 0.770874] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart2_pins selector=2
[ 0.772003] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart3_pins selector=3
[ 1.799316] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_twl4030_pins selector=4
[ 1.808624] pinctrl_generic_add_group: pctldev=ee446000 name=pinmux_twl4030_vpins selector=0
[ 1.978912] pinctrl_generic_add_group: pctldev=ee486f80 name=spi_gpio_pinmux selector=1
[ 2.173522] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mmc1_pins selector=5
[ 2.207916] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_wlan_irq_pin selector=6
[ 2.383666] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_gpmc_pins selector=7
[ 2.778808] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_wlan_pins selector=8
[ 2.794708] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_bt_pins selector=9
[ 5.405273] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_dss_dpi_pins selector=10
[ 6.051666] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp1_pins selector=11
[ 6.136688] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp2_pins selector=12
[ 6.222015] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp3_pins selector=13
[ 6.261199] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_penirq_pins selector=14
[ 6.288787] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp4_pins selector=15
[ 6.336700] pinctrl_generic_add_group: pctldev=ee446180 name=hdq_pins selector=16
[ 6.607788] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_camera_pins selector=17
[ 6.900848] pinctrl_generic_add_group: pctldev=ee446180 name=backlight_pins_pinmux selector=18
[ 7.298217] pinctrl_generic_add_group: pctldev=ee446180 name=modem_pins selector=19
[ 8.134826] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_pps_pins selector=20
root@letux:~#

And I was no longer able to reproduce the strcmp(NULL) issue. So it is either better hidden
or gone.

So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().

I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.

The race I suspect is that two drivers are trying to insert the same name and may come
both to the conclusion that it does not yet exist. And both insert into the radix tree.

The window of risk is small though... It is in pinctrl_generic_add_group() between calling
pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
see it in real hardware tests.

Anyhow, I'll apply the patches as "hacks" to the LetuxOS kernel (4.17.x and 4.18-rc1) so that
more users can test. And we can replace them if they are finally accepted upstream.

BR and thanks for fixing the bug,
Nikolaus