Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xHCI reset timeout during s2idle resume of Raspberry Pi 4 B #1931

Open
lategoodbye opened this issue Jan 2, 2025 · 12 comments
Open

xHCI reset timeout during s2idle resume of Raspberry Pi 4 B #1931

lategoodbye opened this issue Jan 2, 2025 · 12 comments

Comments

@lategoodbye
Copy link

lategoodbye commented Jan 2, 2025

Is this the right place for my bug report?
It seems related to the VL805 firmware, so i'm not sure.

Describe the bug
I'm doing some s2idle tests with the Raspberry Pi 4B. It seems to work except of xHCI (VIA VL805), which timeouts after xHCI reset command during the resume phase. Here is the kernel log with some additional log messages:

[47893.190601] PM: Triggering wakeup from IRQ 25
[47893.190622] PM: resume from suspend-to-idle
[47893.190761] brcm-pcie fd500000.pcie: brcm_pcie_resume_noirq
[47893.190767] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.190871] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.190876] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[47893.191086] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.191311] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[47893.319207] brcm-pcie fd500000.pcie: clkreq-mode set to default
[47893.321263] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[47893.346468] PM: noirq resume of devices complete after 155.839 msecs
[47893.346795] PM: early resume of devices complete after 0.290 msecs
[47893.467752] bcmgenet fd580000.ethernet eth0: Link is Down
[47893.494488] raspberrypi-reset soc:firmware:reset: Notify xHCI reset
[47893.642237] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[47893.785051] brcmfmac: brcmf_c_process_txcap_blob: no txcap_blob available (err=-2)
[47893.785379] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Nov  1 2021 00:37:25 version 7.45.241 (1a2f2fa CY) FWID 01-703fd60
[47894.515224] usb usb1: root hub lost power or was reset
[47894.515235] usb usb2: root hub lost power or was reset
[47894.515239] xhci_hcd 0000:01:00.0: Stop HCD
[47894.515245] xhci_hcd 0000:01:00.0: HCD stopped
[47894.515252] xhci_hcd 0000:01:00.0: Reset the HC, CMD: 00000002
[47921.929950] xhci_hcd 0000:01:00.0: xhci_handshake_check_state failed: -110
[47921.930007] xhci_hcd 0000:01:00.0: Failed to reset: -110
[47921.930014] xhci_hcd 0000:01:00.0: PCI post-resume error -110!
[47921.930020] xhci_hcd 0000:01:00.0: HC died; cleaning up
[47921.930034] xhci_hcd 0000:01:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110
[47921.930054] xhci_hcd 0000:01:00.0: PM: failed to resume async: error -110
[47921.930128] PM: resume of devices complete after 28583.092 msecs
[47921.930540] OOM killer enabled.
[47921.930544] Restarting tasks ...
[47921.930586] usb 1-1: USB disconnect, device number 2
[47921.934894] done.
[47921.934924] random: crng reseeded on system resumption
[47921.941215] PM: suspend exit

How can i figure out that the VL805 firmware is really functional after raspberrypi-reset soc:firmware:reset: Notify xHCI reset ?
Is it possible that HCD stop cause this issue?

To reproduce

sudo su
echo enabled > /sys/class/tty/ttyS1/power/wakeup
echo freeze > /sys/power/state
# wait some seconds
# press key on console

Expected behaviour
xHCI reset command is successful like during driver probe

Actual behaviour
xHCI reset timeouts during resume, Heartbeat LED is blocked during this timeout

System

VL805 firmware:
000138a1

@lategoodbye
Copy link
Author

Important note: this issue is only reproducible with Raspberry Pi 4 boards without EEPROM for the VL805 firmware. The newer boards which have a EEPROM for the VL805 firmware are not affected by this issue:

[   96.799927] PM: Triggering wakeup from IRQ 25
[   96.799945] PM: resume from suspend-to-idle
[   96.800057] brcm-pcie fd500000.pcie: brcm_pcie_resume_noirq
[   96.800064] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[   96.800169] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[   96.800174] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[   96.800386] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[   96.800612] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[   96.927459] brcm-pcie fd500000.pcie: clkreq-mode set to default
[   96.929518] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[   96.954725] PM: noirq resume of devices complete after 154.775 msecs
[   96.955053] PM: early resume of devices complete after 0.287 msecs
[   97.072080] bcmgenet fd580000.ethernet eth0: Link is Down
[   97.072374] raspberrypi-reset soc:firmware:reset: Notify xHCI reset
[   97.247036] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[   97.390105] brcmfmac: brcmf_c_process_txcap_blob: no txcap_blob available (err=-2)
[   97.390451] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Nov  1 2021 00:37:25 version 7.45.241 (1a2f2fa CY) FWID 01-703fd60
[   98.083491] usb usb1: root hub lost power or was reset
[   98.083506] usb usb2: root hub lost power or was reset
[   98.083511] xhci_hcd 0000:01:00.0: Stop HCD
[   98.083546] xhci_hcd 0000:01:00.0: HCD stopped
[   98.083553] xhci_hcd 0000:01:00.0: Reset the HC, CMD: 00000002
[   98.083675] xhci_hcd 0000:01:00.0: // Disabling event ring interrupts
[   98.083682] xhci_hcd 0000:01:00.0: cleaning up memory
[   98.083981] xhci_hcd 0000:01:00.0: xhci_stop completed - status = 11
[   98.083987] xhci_hcd 0000:01:00.0: Initialize the xhci_hcd
[   98.084282] xhci_hcd 0000:01:00.0: Start the primary HCD
[   98.084442] xhci_hcd 0000:01:00.0: Start the secondary HCD
[   98.084481] xhci_hcd 0000:01:00.0: xhci_resume: starting usb1 port polling.
[   98.359660] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[   98.608754] PM: resume of devices complete after 1653.699 msecs
[   98.609184] OOM killer enabled.
[   98.609189] Restarting tasks ... done.
[   98.620417] random: crng reseeded on system resumption
[   98.621329] PM: suspend exit

@andrum993
Copy link

andrum993 commented Jan 4, 2025

Important note: this issue is only reproducible with Raspberry Pi 4 boards without EEPROM for the VL805 firmware. The newer boards which have a EEPROM for the VL805 firmware are not affected by this issue:

You've got that backwards:

  • Pi 4B rev 1.3 and earlier have two EEPROMs - one for bootloader, one for VL805.
  • Pi 4B from rev 1.4 onwards have a single EEPROM that contains both bootloader and VL805 firmware.

So it sounds like it is actually the newer boards that suffer from this issue.

If memory serves, on boards without a separate EEPROM chip for the VL805 firmware, the VPU firmware running on the SoC (BCM2711) is responsible for sending the firmware to the VL805, so I'm guessing that after an xHCI reset the VL805 needs its firmware reloading. Which seems to be confimed by https://forums.raspberrypi.com/viewtopic.php?t=375483#p2246599 and https://forums.raspberrypi.com/viewtopic.php?t=317494#p1900532.

I suspect you need to make the mailbox call timg236 mentions in https://forums.raspberrypi.com/viewtopic.php?t=375483#p2246599. I think I've found the driver that does this at https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/reset/reset-raspberrypi.c.

@lategoodbye
Copy link
Author

lategoodbye commented Jan 4, 2025

@andrum993 Thanks you for the feedback. My problem is that most of my Raspberry Pi 4 boards are prototype boards. All i can say is that the affected PCB (bad case) hasn't a EEPROM (8 pins) assembled near the VL805 and the good case PCB has a EEPROM assembled.

You are correct regarding the VL805 firmware reloading process that the VPU is responsible and the reset-raspberrypi driver triggers this process. As you can see from the traces above the necessary mailbox call is successfully send in both cases (bad and good case):

[47893.494488] raspberrypi-reset soc:firmware:reset: Notify xHCI reset

I assume the call doesn't confirm that the VL805 firmware is actually uploaded, because of the sleeps in the reset driver. I tried to increase the sleep in reset-raspberrypi but it doesn't help. So that's the reason, why i asked how can i figure out that the VL805 firmware is actually loaded?

@lategoodbye
Copy link
Author

I found this helpful comment by @timg236

Before s2idle (bad case):

root@raspberrypi:/home/pi# lspci -d 1106:3483 -xxx
01:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
00: 06 11 83 34 46 05 10 00 01 30 03 0c 10 00 00 00
10: 04 00 00 f8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
30: 00 00 00 00 80 00 00 00 00 00 00 00 27 01 00 00
40: 00 00 00 00 00 01 00 00 09 10 00 40 04 00 00 00
50: c0 38 01 00 00 00 00 00 00 00 00 00 06 11 83 34

After s2idle (bad case):

root@raspberrypi:/sys/power# lspci -d 1106:3483 -xxx
01:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
00: 06 11 83 34 46 05 10 00 01 30 03 0c 10 00 00 00
10: 04 00 00 f8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
30: 00 00 00 00 80 00 00 00 00 00 00 00 27 01 00 00
40: 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34

So it seems to me the VL805 firmware is not loaded.

@andrum993
Copy link

Why do you need this to work on old prototype boards? I suggest you test each of the production board variants and if it works on those, then there isn't a problem.

@lategoodbye
Copy link
Author

lategoodbye commented Jan 4, 2025

Sorry, i spend now a lot of my spare time to upstream s2idle for Raspberry Pi boards since July 2024. The test feedback so far was very little and believe me as an ex kernel maintainer these weren't trivial issues. I tested it with 4 Raspberry Pi 4 and 3 of them showed this issue. So why should i buy a new one, while it's very likely to be a software issue?

@timg236
Copy link

timg236 commented Jan 6, 2025

I don't believe the VL805 chip ROM supports a firmware reload interface - at a minimum you'd have to do a PCIe fundamental reset, I don't know if that's guaranteed to fully reset the VL805 though. Running the VL805 w/o flash is not well supported by VIA so I'm not hopefully about this.

@lategoodbye
Copy link
Author

@timg236 Thanks for the hint. The DT binding / pcie driver defines possible 4 different types of reset (perst, rescal, bridge, swinit) and 3 regulators (vpcie3v3, vpcie3v3aux, vpcie12v), but none of them are defined in the RPi 4 DT.

Does this really represent the actual hardware?
Which of them represent the mentioned "fundamental reset"?

I've seen in the CM4 datasheet there is a pin PCIe_nRST.

Is this pin connected to the VL805 in case of RPi 4?
How is this pin controlled (VPU, ARM, PCIe IP, GPIO)?

@timg236
Copy link

timg236 commented Jan 6, 2025

I don't know the exact details of perst vs swinit but any wake up code will at least have to go through the sequence in brcm_pcie_setup, enumerate PCIe then do the XHCI reset.

https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/pci/controller/pcie-brcmstb.c#L1159

@lategoodbye
Copy link
Author

Okay, this sounds to me that i better start testing s2idle with a CM4 + a PCIe device. After this works flawless, i can continue with the RPi 4. Until now i only tested the CM4 without any PCIe device.

@timg236
Copy link

timg236 commented Jan 6, 2025

Chatting with others, I the VL805 without dedicated SPI flash is a special case because of the requirement to reload the XHCI firmware after PCIe is reset. I think the issue is that fundamental reset doesn't cause the VL805 ROM to fully reset it's internal state.

A CM4 with an NVMe device or even better an XHCI card would be a good starting point.

@lategoodbye
Copy link
Author

Here are my current test results:
s2idle on CM4 without any PCIe endpoint = works
s2idle on CM4 with NVMe = works
s2idle on Rpi 4 with dedicated VL805 EEPROM = works
s2idle on Rpi 4 without dedicated VL805 EEPROM = break xHCI
modprobe -r pcie_brcmstb; modprobe pcie_brcmstb = recover xHCI after breakage
modprove -r xhci_pci; modprobe xhci_pci = doesn't recover xHCI after breakage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants