Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S0ix suspend issues on Novacustom V5xx #9372

Open
marmarek opened this issue Jul 22, 2024 · 8 comments
Open

S0ix suspend issues on Novacustom V5xx #9372

marmarek opened this issue Jul 22, 2024 · 8 comments
Labels
affects-4.2 This issue affects Qubes OS 4.2. affects-4.3 This issue affects Qubes OS 4.3. C: power management hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@marmarek
Copy link
Member

How to file a helpful issue

Qubes OS release

R4.2 / R4.3

Brief summary

Using S0ix results in a broken system after resume.

Steps to reproduce

  1. Enable S0ix according to Support for the S0ix sleep state #6411
  2. Suspend the system

Expected behavior

System correctly suspends, and is fully functional after resume

Actual behavior

Power LED blinks, but according to /sys/kernel/debug/pmc_core/substate_residencies it didn't actually suspend. /sys/kernel/debug/pmc_core/substate_requirements also has empty "status" column next to all requirements.

After resume wired network is broken. sys-net logs have:

sys-net logs

[2024-07-22 12:45:30] [  237.842643] e1000e 0000:00:07.0 ens7: NIC Link is Down
[2024-07-22 12:45:30] [  237.863042] Freezing user space processes
[2024-07-22 12:45:30] [  237.864602] Freezing user space processes completed (elapsed 0.001 seconds)
[2024-07-22 12:45:30] [  237.864626] OOM killer disabled.
[2024-07-22 12:45:30] [  237.864637] Freezing remaining freezable tasks
[2024-07-22 12:45:30] [  237.865584] Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
[2024-07-22 12:45:30] [  237.865607] xen:manage: Using suspend/resume for sleep/wakeup
[2024-07-22 12:45:30] [  237.868291] e1000e: EEE TX LPI TIMER: 00000011
[2024-07-22 12:46:34] [  237.935960] xen:grant_table: Grant tables using version 1 layout
[2024-07-22 12:46:34] [  237.983971] iwlwifi 0000:00:06.0: WRT: Invalid buffer destination
[2024-07-22 12:46:34] [  238.141605] iwlwifi 0000:00:06.0: Not valid error log pointer 0x0024B5C0 for RT uCode
[2024-07-22 12:46:34] [  238.141784] iwlwifi 0000:00:06.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
[2024-07-22 12:46:34] [  238.141818] iwlwifi 0000:00:06.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[2024-07-22 12:46:34] [  238.141849] iwlwifi 0000:00:06.0: WFPM_AUTH_KEY_0: 0x80
[2024-07-22 12:46:34] [  238.141874] iwlwifi 0000:00:06.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[2024-07-22 12:46:34] [  238.142487] iwlwifi 0000:00:06.0: RFIm is deactivated, reason = 4
[2024-07-22 12:46:37] [  240.729199] e1000e 0000:00:07.0 ens7: Failed to disable ULP
[2024-07-22 12:48:46] [  369.728111] e1000e 0000:00:07.0 ens7: Hardware Error
[2024-07-22 12:48:46] [  369.728146] e1000e 0000:00:07.0 ens7: Timesync Tx Control register not set as expected
[2024-07-22 12:48:46] [  369.829179] e1000e 0000:00:07.0: EEE advertisement - unable to acquire PHY
[2024-07-22 12:48:46] [  369.832451] OOM killer enabled.
[2024-07-22 12:48:46] [  369.832458] Restarting tasks ... done.

After resume, sys-net was semi-frozen from some time (over a minute), qubes.SuspendPost service failed (due to vchan timeout). qvm-run --nogui appears to work, but I'm not 100% sure if it's only because I tried it later.

Wireless appears to be functional (at least listing available networks work).

sys-usb appears to be functional.

@marmarek marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. hardware support affects-4.2 This issue affects Qubes OS 4.2. affects-4.3 This issue affects Qubes OS 4.3. labels Jul 22, 2024
@marmarek
Copy link
Member Author

Reloading e1000e module in sys-net does not help.

@marmarek
Copy link
Member Author

Ugh...

drivers/net/ethernet/intel/e1000e/ich8lan.c:

        /* It is not possible to be certain of the current state of ULP
         * so forcibly disable it.
         */
        hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
        ret_val = e1000_disable_ulp_lpt_lp(hw, true);
        if (ret_val)
                e_warn("Failed to disable ULP\n");
...
/**     
 *  e1000_disable_ulp_lpt_lp - unconfigure Ultra Low Power mode for LynxPoint-LP
 *  @hw: pointer to the HW structure
 *  @force: boolean indicating whether or not to force disabling ULP
 *
 *  Un-configure ULP mode when link is up, the system is transitioned from
 *  Sx or the driver is unloaded.  If on a Manageability Engine (ME) enabled
 *  system, poll for an indication from ME that ULP has been un-configured.
 *  If not on an ME enabled system, un-configure the ULP mode by software.
 *      
 *  During nominal operation, this function is called when link is acquired
 *  to disable ULP mode (force=false); otherwise, for example when unloading
 *  the driver or during Sx->S0 transitions, this is called with force=true
 *  to forcibly disable ULP.
 */     
static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
{       
...
                if (force) {
                        /* Request ME un-configure ULP mode in the PHY */
                        mac_reg = er32(H2ME);
                        mac_reg &= ~E1000_H2ME_ULP;
                        mac_reg |= E1000_H2ME_ENFORCE_SETTINGS;
                        ew32(H2ME, mac_reg);
                }

But, ew32(H2ME, ...) actually writes to the lan device register, not a separate device - here, in bar0:

#define E1000_H2ME              0x05B50 /* Host to ME */
#define E1000_H2ME_START_DPG    0x00000001      /* indicate the ME of DPG */
#define E1000_H2ME_EXIT_DPG     0x00000002      /* indicate the ME exit DPG */
#define E1000_H2ME_ULP          0x00000800      /* ULP Indication Bit */
#define E1000_H2ME_ENFORCE_SETTINGS     0x00001000      /* Enforce Settings */

It's not clear to me how they communicate, but maybe assigning device to the VM breaks this communication?

Or maybe it's more generic problem. When it happens I see a mismatch in memory decoding (see Mem+ or Mem- in Control, and also [disabled] next to Region 0:

sys-net: lspci -vvs 7.0
00:07.0 Ethernet controller: Intel Corporation Device 550a (rev 20)
	Subsystem: CLEVO/KAPOK Computer Device a743
	Physical Slot: 7
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin D routed to IRQ 47
	Region 0: Memory at f2000000 (32-bit, non-prefetchable) [size=128K]
	Capabilities: [c8] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Kernel modules: e1000e

dom0: lspci -vvs 1f.6
00:1f.6 Ethernet controller: Intel Corporation Device 550a (rev 20)
	DeviceName: Ethernet controller
	Subsystem: CLEVO/KAPOK Computer Device a743
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin D routed to IRQ 21
	Region 0: Memory at b54a0000 (32-bit, non-prefetchable) [disabled] [size=128K]
	Capabilities: [c8] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 00000000fee01458  Data: 0000
	Capabilities: [e0] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: pciback
	Kernel modules: e1000e

@marmarek
Copy link
Member Author

Or maybe it's more generic problem. When it happens I see a mismatch in memory decoding (see Mem+ or Mem- in Control, and also [disabled] next to Region 0:

That's it, re-enabling memory decoding in dom0 makes device working again. Worth checking if #6411 (comment) isn't the same problem.
FYI @HW42

@wessel-novacustom
Copy link

Is S3 working fine? If so, is any post installation step needed?

@marmarek
Copy link
Member Author

S3 works fine and should be active by default on V5xx series, no manual steps are required. I keep this issue open because I would like to make S0ix working too at some point, but that shouldn't affect users.

@wessel-novacustom
Copy link

S3 works fine and should be active by default on V5xx series, no manual steps are required. I keep this issue open because I would like to make S0ix working too at some point, but that shouldn't affect users.

I'm positively surprised about that. Great!

@andrewdavidwong andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed diagnosed Technical diagnosis has been performed (see issue comments). labels Sep 23, 2024
@macpijan
Copy link

macpijan commented Dec 10, 2024

@marmarek Can you please confirm which firmware have you have used for testing/certification process? Was it v0.9.0 release build?

@marmarek
Copy link
Member Author

marmarek commented Dec 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.2 This issue affects Qubes OS 4.2. affects-4.3 This issue affects Qubes OS 4.3. C: power management hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

4 participants