Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
From: Udipto Goswami
Date: Mon May 19 2025 - 14:14:13 EST
On Mon, May 19, 2025 at 6:23 PM Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>
> On 17.5.2025 7.39, Roy Luo wrote:
> > This reverts commit 6ccb83d6c4972ebe6ae49de5eba051de3638362c.
> >
> > Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state()
> > helper") was introduced to workaround watchdog timeout issues on some
> > platforms, allowing xhci_reset() to bail out early without waiting
> > for the reset to complete.
> >
> > Skipping the xhci handshake during a reset is a dangerous move. The
> > xhci specification explicitly states that certain registers cannot
> > be accessed during reset in section 5.4.1 USB Command Register (USBCMD),
> > Host Controller Reset (HCRST) field:
> > "This bit is cleared to '0' by the Host Controller when the reset
> > process is complete. Software cannot terminate the reset process
> > early by writinga '0' to this bit and shall not write any xHC
> > Operational or Runtime registers until while HCRST is '1'."
> >
> > This behavior causes a regression on SNPS DWC3 USB controller with
> > dual-role capability. When the DWC3 controller exits host mode and
> > removes xhci while a reset is still in progress, and then tries to
> > configure its hardware for device mode, the ongoing reset leads to
> > register access issues; specifically, all register reads returns 0.
> > These issues extend beyond the xhci register space (which is expected
> > during a reset) and affect the entire DWC3 IP block, causing the DWC3
> > device mode to malfunction.
>
> I agree with you and Thinh that waiting for the HCRST bit to clear during
> reset is the right thing to do, especially now when we know skipping it
> causes issues for SNPS DWC3, even if it's only during remove phase.
>
> But reverting this patch will re-introduce the issue originally worked
> around by Udipto Goswami, causing regression.
>
> Best thing to do would be to wait for HCRST to clear for all other platforms
> except the one with the issue.
>
> Udipto Goswami, can you recall the platforms that needed this workaroud?
> and do we have an easy way to detect those?
Hi Mathias,
>From what I recall, we saw this issue coming up on our QCOM mobile
platforms but it was not consistent. It was only reported in long runs
i believe. The most recent instance when I pushed this patch was with
platform SM8650, it was a watchdog timeout issue where xhci_reset() ->
xhci_handshake() polling read timeout upon xhci remove. Unfortunately
I was not able to simulate the scenario for more granular testing and
had validated it with long hours stress testing.
The callstack was like so:
Full call stack on core6:
-000|readl([X19] addr = 0xFFFFFFC03CC08020)
-001|xhci_handshake(inline)
-001|xhci_reset([X19] xhci = 0xFFFFFF8942052250, [X20] timeout_us = 10000000)
-002|xhci_resume([X20] xhci = 0xFFFFFF8942052250, [?] hibernated = ?)
-003|xhci_plat_runtime_resume([locdesc] dev = ?)
-004|pm_generic_runtime_resume([locdesc] dev = ?)
-005|__rpm_callback([X23] cb = 0xFFFFFFE3F09307D8, [X22] dev =
0xFFFFFF890F619C10)
-006|rpm_callback(inline)
-006|rpm_resume([X19] dev = 0xFFFFFF890F619C10,
[NSD:0xFFFFFFC041453AD4] rpmflags = 4)
-007|__pm_runtime_resume([X20] dev = 0xFFFFFF890F619C10, [X19] rpmflags = 4)
-008|pm_runtime_get_sync(inline)
-008|xhci_plat_remove([X20] dev = 0xFFFFFF890F619C00)
-009|platform_remove([X19] _dev = 0xFFFFFF890F619C10)
-010|device_remove(inline)
-010|__device_release_driver(inline)
-010|device_release_driver_internal([X20] dev = 0xFFFFFF890F619C10,
[?] drv = ?, [X19] parent = 0x0)
-011|device_release_driver(inline)
-011|bus_remove_device([X19] dev = 0xFFFFFF890F619C10)
-012|device_del([X20] dev = 0xFFFFFF890F619C10)
-013|platform_device_del(inline)
-013|platform_device_unregister([X19] pdev = 0xFFFFFF890F619C00)
-014|dwc3_host_exit(inline)
-014|__dwc3_set_mode([X20] work = 0xFFFFFF886072F840)
-015|process_one_work([X19] worker = 0xFFFFFF887AEE46C0, [X21] work =
0xFFFFFF886072F840)
-016|worker_thread([X19] __worker = 0xFFFFFF887AEE46C0)
-017|kthread([X22] _create = 0xFFFFFF8937350600)
-018|ret_from_fork(asm)
---|end of frame