BUG in mmc: core: Disable card detect during shutdown

From: H. Nikolaus Schaller
Date: Mon May 30 2022 - 12:55:58 EST


Hi Ulf,
users did report a strange issue that the OMAP5 based Pyra does not
shutdown if a kernel 5.10.116 is used.

Someone did a bisect and found that reverting

0d66b395210c5084c2b7324945062c1d1f95487a

resp. upstream

66c915d09b942fb3b2b0cb2f56562180901fba17

solves it.

I could now confirm that it also happens with v5.18.0.
But interestingly only on the Pyra handheld device and not
on the omap5evm (which is supported by mainline).

The symptom is:

a) without revert

root@letux:~# poweroff

Broadcast message from root@letux (console) (Sat Jan 1 01:08:25 2000):

The system is going down for system halt NOW!
INIT: Sending processes the TERM signal
root@letux:~# [info] Using makefile-style concurrent boot in runlevel 0.
[....] Stopping cgroup management proxy daemon: cgproxy[....] Stopping cgroup management daemon: cgmanager[....] Stop[ ok bluetooth: /usr/sbin/bluetoothd.
[FAIL] Stopping ISC DHCP server: dhcpd failed!
dhcpcd[3055]: sending signal 15 to pid 2976
dhcpcd[3055]: waiting for pid 2976 to exit
[ ok ] Shutting down ALSA...done.
[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 2 seconds...done.
[ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
[ ok ....] Deconfiguring network interfaces...done.
^[[c[info] Saving the system clock.
[info] Hardware Clock updated to Sat Jan 1 01:08:30 UTC 2000.
[ ok ] Deactivating swap...done.
^[[c[ 77.289332] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
[info] Will now halt.

b) with reverting your patch

root@letux:~# uname -a
Linux letux 5.18.0-letux-lpae+ #9678 SMP PREEMPT Mon May 30 18:02:28 CEST 2022 armv7l GNU/Linux
root@letux:~# poweroff

Broadcast message from root@letux (console) (Sat Jan 1 01:39:15 2000):

The system is going down for system halt NOW!
INIT: Sending processes the TERM signal
root@letux:~# [info] Using makefile-style concurrent boot in runlevel 0.
[FAIL] Stopping cgroup management proxy daemon: cgproxy[....] Stopping ISC DHCP server: dhcpd failed!
[....] Stopping cgroup management daemon: cgmanagerdhcpcd[3100]: sending signal 15 to pid 3013
dhcpcd[3100]: waiting for pid 3013 to exit
[ ok ] Stopping bluetooth: /usr/sbin/bluetoothd.
[ ok ] Shutting down ALSA...done.
[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 3 seconds...done.
[ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
[ ok ....] Deconfiguring network interfaces...done.
^[[c[info] Saving the system clock.
[info] Hardware Clock updated to Sat Jan 1 01:39:21 UTC 2000.
[ ok ] Deactivating swap...done.
^[[c[ 44.563256] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
[info] Will now halt.
[ 46.917534] reboot: Power down


What I suspect is that we have multiple mmc interfaces and have
card detect wired up in the Pyra while it is ignored in the
EVM. Is it possible that __mmc_stop_host() never returns in
.shutdown_pre if card detect is set up (and potentially
shut down earlier)?

Setup of mmc is done in omap5-board-common.dtsi and omap5.dtsi.

Out Pyra has a non-upstream device tree where we use
omap5-board-common.dtsi and overwrite it by e.g.

&mmc4 { /* second (u)SD slot (SDIO capable) */
status = "okay";
vmmc-supply = <&ldo2_reg>;
pinctrl-names = "default";
pinctrl-0 = <&mmc4_pins>;
bus-width = <4>;
cd-gpios = <&gpio3 13 GPIO_ACTIVE_LOW>; /* gpio3_77 */
wp-gpios = <&gpio3 15 GPIO_ACTIVE_HIGH>; /* gpio3_79 */
};

But I have tried to remove the cd-gpois and wp-gpois. Or make the
mmc interface being disabled (but I may not have catched everything
in first place).

Then I added some printk to mmc_stop_host() and __mmc_stop_host().

mmc_stop_host() is not called but __mmc_stop_host() is called 4 times.
There are 4 active MMC interfaces in the Pyra - 3 for (µ)SD slots
and one for an SDIO WLAN module.

Now it looks as if 3 of them are properly teared down (two of them
seem to have host->slot.cd_irq >= 0) but on the fourth call
cancel_delayed_work_sync(&host->detect); does not return. This is
likely the location of the stall why we don't see a "reboot: Power down"

Any ideas?

BR and thanks,
Nikolaus

printk hack:

void __mmc_stop_host(struct mmc_host *host)
{
printk("%s 1\n", __func__);
if (host->slot.cd_irq >= 0) {
printk("%s 2\n", __func__);
mmc_gpio_set_cd_wake(host, false);
printk("%s 3\n", __func__);
disable_irq(host->slot.cd_irq);
printk("%s 4\n", __func__);
}

host->rescan_disable = 1;
printk("%s 5\n", __func__);
cancel_delayed_work_sync(&host->detect);
printk("%s 6\n", __func__);
}

resulting log:

[info] Will now halt.
[ 282.780929] __mmc_stop_host 1
[ 282.784276] __mmc_stop_host 2
[ 282.787735] __mmc_stop_host 3
[ 282.791030] __mmc_stop_host 4
[ 282.794235] __mmc_stop_host 5
[ 282.797369] __mmc_stop_host 6
[ 282.800918] __mmc_stop_host 1
[ 282.804269] __mmc_stop_host 5
[ 282.807541] __mmc_stop_host 6
[ 282.810715] __mmc_stop_host 1
[ 282.813842] __mmc_stop_host 2
[ 282.816984] __mmc_stop_host 3
[ 282.820175] __mmc_stop_host 4
[ 282.823302] __mmc_stop_host 5
[ 282.826449] __mmc_stop_host 6
[ 282.830941] __mmc_stop_host 1
[ 282.834076] __mmc_stop_host 5

--- here should be another __mmc_stop_host 6
--- and reboot: Power down