Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lantiq: xrx200: switch to the mainline DSA driver #3085

Closed
wants to merge 3 commits into from

Conversation

xdarklight
Copy link
Contributor

@xdarklight xdarklight commented Jun 7, 2020

Enable the XRX200 PMAC, GSWIP DSA tag and GSIP DSA drivers in the 5.4
kernel config. Finally switch existing vr9_*.dts{,i} to use the new
Ethernet and switch drivers.

WiP:

  • support for 4.19 has to be dropped first
  • Ethernet TX breaks (can be reproduced in a few seconds by letting iperf3 transmit data from the device to another computer)
  • switch to the upstream versions (from the net/net-next tree) of the Ethernet TX fixes from @hauke
  • clarification about the pre-init script is needed (why is it needed -> should be part of the commit message; if it is not needed it should be removed)
  • include code-review changes suggested by @olek2 -> TODO for @xdarklight
  • needs code-review
  • needs testing on as many devices as possible: BT Home Hub 5A, DM200, Fritz 7430, TD-W8970, TD-W9980 and VGV7510KW22 have passed testing - thanks to everyone who reported back!
  • according to @duud FritzBox 7412 is showing random disconnects (only with this pull request though). since I don't have a 7412 I cannot test it myself so I mentioned @AndyX90 (the original author of the 7412.dts) to help testing
  • submit WiP patches upstream and ask for them to be backported. These should fix: 1) the Fast Ethernet ports on 7360v2 as tested by @jwh (many thanks for your patience!) 2) random disconnects on the 7412 as confirmed by @duud 3) fixed-links for future board support (like 7490)
  • rebase on top of latest master branch and update the testing kernel version (5.10) as well
  • depending on when the patches are included in a stable release: add them as patches to this PR or wait. the patches will land in 5.4.112 and 5.10.30. we should wait until OpenWrt uses these kernel versions
  • @abajk found some issue in the upstream lantiq_xrx200.c driver which may lead to memory corruption
  • wait for the pull requests for Linux 5.4.124 (kernel: bump 5.4 to 5.4.124 #4226) and 5.10.42 (kernel: bump 5.10 to 5.10.42 #4225) to be included in OpenWrt as the memory corruption fix from @abajk is included there
  • @abajk found a bug in the latest fix for a memory corruption issue in the upstream lantiq_xrx200.c
  • waiting for @hauke's https://patchwork.ozlabs.org/project/openwrt/list/?series=249780 series is included in master

@abajk
Copy link
Contributor

abajk commented Jun 7, 2020

Hi,
I recently tested this DSA driver on BT Home Hub 5A [1].
There are some issues:

  • after some time (mostly 20-30 minut) tx hangs
  • Master interface schould brought before slave interface

Next week I can do more tests on BT HH 5A and DWR-966 (xrx330, not yet supported).

[1] https://github.com/olek2/openwrt/commits/lantiq_test

@adschm adschm added target/lantiq pull request/issue for lantiq target work in progress pull request the author is still working on labels Jun 7, 2020
@adschm
Copy link
Member

adschm commented Jun 7, 2020

Eventually, swconfig should be removed (kernel config symbols and packages as below):

adsc@buildfff:/data/openwrt$ grep -rn "swconfig" target/linux/lantiq/ |sort
target/linux/lantiq/image/ar9.mk:10:    kmod-ltq-deu-ar9 -swconfig
target/linux/lantiq/image/ar9.mk:25:    kmod-ltq-deu-ar9 kmod-usb-dwc2 -swconfig
target/linux/lantiq/image/danube.mk:111:        ltq-adsl-app ppp-mod-pppoa -swconfig
target/linux/lantiq/image/xway_legacy.mk:54:    ltq-adsl-app ppp-mod-pppoa -swconfig
target/linux/lantiq/patches-4.19/0025-NET-MIPS-lantiq-adds-xrx200-net.patch:685:+// swconfig interface
target/linux/lantiq/patches-5.4/0025-NET-MIPS-lantiq-adds-xrx200-legacy.patch:685:+// swconfig interface
target/linux/lantiq/xrx200/target.mk:19:        swconfig
target/linux/lantiq/xway_legacy/target.mk:7:DEFAULT_PACKAGES+=kmod-leds-gpio kmod-gpio-button-hotplug swconfig
target/linux/lantiq/xway/target.mk:7:DEFAULT_PACKAGES+=kmod-leds-gpio kmod-gpio-button-hotplug swconfig

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 8e33934 to 89231c4 Compare June 7, 2020 14:04
@xdarklight
Copy link
Contributor Author

xdarklight commented Jun 7, 2020

Eventually, swconfig should be removed (kernel config symbols and packages as below):

I'll fix the xrx200 subtarget
the "xway" subtarget still needs it as some devices there are using non-mainline switch drivers.

when doing so I'll also update the subject to "lantiq: xrx200: ..." to make it clear what I'm touching exactly

✔️ done

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 89231c4 to 9667363 Compare June 7, 2020 14:12
@xdarklight xdarklight changed the title WIP: lantiq: switch to the mainline DSA driver WIP: lantiq: xrx200: switch to the mainline DSA driver Jun 7, 2020
@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 9667363 to 8fac354 Compare June 7, 2020 14:27
@xdarklight
Copy link
Contributor Author

xdarklight commented Jun 7, 2020

I recently tested this DSA driver on BT Home Hub 5A [1].

I'm sad to see that we duplicated work 😢

There are some issues:

  • after some time (mostly 20-30 minut) tx hangs

interesting, I can reproduce this with iperf3 in a few seconds

  • Master interface schould brought before slave interface

I included your patch for this in this PR - many thanks!

@dwmw2
Copy link
Contributor

dwmw2 commented Jun 9, 2020

On my HH5a at the moment I can go to the 'br-lan' config in luci and add eth0.1.1 to the list of physical interfaces, which results in VLAN ID 1 actually being sent out the LAN Ethernet ports. Running tcpdump on eth0 itself shows:

09:56:42.142906 1a:62:2c:5d:94:6c > 33:33:ff:00:00:03, ethertype 802.1Q (0x8100), length 94: vlan 1, p 0, ethertype 802.1Q, vlan 1, p 0, ethertype IPv6, fe80::1862:2cff:fe5d:946c > ff02::1:ff00:3: ICMP6, neighbor solicitation, who has 2001:8b0:10b:3::3, length 32

Does this still work after conversion to DSA? Can I add lan1.1 lan2.1 etc. to my br-lan and expect it to work?

(I ask because on the Asus RT-AC85P which already uses DSA, this doesn't seem to be working. I can live with that for the moment, but wouldn't want a regression on my main VDSL router...)

@xdarklight
Copy link
Contributor Author

[...] Can I add lan1.1 lan2.1 etc. to my br-lan and expect it [VLANs] to work?

I haven't tried this myself yet - I'll try to test it on Sunday or next week as I don't have access to my HH5A for the rest of the week

@abajk
Copy link
Contributor

abajk commented Jun 12, 2020

@xdarklight
It is still WiP. It looks like that my script didn't work. It is based on ramips script. GSWIP driver is loaded before preinit. It fail on this condition [1] and print this message:
[ 0.662321] gswip 1e108000.switch: dsa switch register failed: -517
I have no idea how to bring up eth0 earlier.

Bootlog:

[    0.641766] libphy: Fixed MDIO Bus: probed
[    0.650927] libphy: lantiq,xrx200-mdio: probed
[    0.662321] gswip 1e108000.switch: dsa switch register failed: -517
[    0.681524] NET: Registered protocol family 10
[    0.688913] Segment Routing with IPv6
[    0.691245] NET: Registered protocol family 17
[    0.696726] 8021q: 802.1Q VLAN Support v1.8
[    0.705404] pcie-xrx200 1d900000.pcie: failed to get the PCIe PHY
[    0.714500] libphy: lantiq,xrx200-mdio: probed
[    0.738397] gswip 1e108000.switch lan3 (uninitialized): PHY [1e108000.switch-mii:00] driver [Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.5 / v1.6]
[    0.754343] gswip 1e108000.switch lan4 (uninitialized): PHY [1e108000.switch-mii:01] driver [Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.5 / v1.6]
[    0.770046] gswip 1e108000.switch lan2 (uninitialized): PHY [1e108000.switch-mii:11] driver [Intel XWAY PHY11G (xRX v1.2 integrated)]
[    0.784937] gswip 1e108000.switch lan1 (uninitialized): PHY [1e108000.switch-mii:13] driver [Intel XWAY PHY11G (xRX v1.2 integrated)]
[    0.799964] gswip 1e108000.switch wan (uninitialized): PHY [1e108000.switch-mii:05] driver [Intel XWAY PHY11G (PEF 7071/PEF 7072) v1.5 / v1.6]
[    0.813358] DSA: tree 0 setup
[    0.814875] gswip 1e108000.switch: probed GSWIP version 21 mod 0
...
[    3.752956] init: - preinit -

Tested on HH5A

[1] https://elixir.bootlin.com/linux/v5.4.46/source/net/dsa/dsa2.c#L669

@abajk
Copy link
Contributor

abajk commented Jun 12, 2020

interesting, I can reproduce this with iperf3 in a few seconds

Same with iperf. Earlier I tested it as AP and the load was smaller.

@xdarklight
Copy link
Contributor Author

Does this still work after conversion to DSA? Can I add lan1.1 lan2.1 etc. to my br-lan and expect it to work?

for reference, it seems that this will not work as of yet:
http://lists.infradead.org/pipermail/openwrt-adm/2020-June/001445.html

but it seems to be a general problem, affecting all targets, not just lantiq

@xdarklight
Copy link
Contributor Author

xdarklight commented Jun 17, 2020

It is still WiP. It looks like that my script didn't work. It is based on ramips script. GSWIP driver is loaded before preinit.

@olek210 which problem is that preinit script supposed to solve?
please explain in more detail, because I do not understand the underlying problem/what's broken and thus cannot help working on a solution

@hauke
Copy link
Member

hauke commented Jun 21, 2020

[ 0.662321] gswip 1e108000.switch: dsa switch register failed: -517

Should not cause a problem -517 means EPROBE_DEFER and the kernel will try to probe this driver again later. This way everything gets loaded in the correct order.
This is done here intentionally because the GPHY FW should be loaded from user space and that is only available later.

@abajk
Copy link
Contributor

abajk commented Jun 22, 2020

@olek210 which problem is that preinit script supposed to solve?
please explain in more detail, because I do not understand the underlying problem/what's broken and thus cannot help working on a solution

This is done here intentionally because the GPHY FW should be loaded from user space and that is only available later.

My mistake, Driver only print warning. I didn't know it was intentional.

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 8fac354 to 73b0a26 Compare July 4, 2020 20:22
@xdarklight
Copy link
Contributor Author

PR update:

  • the patch "net: dsa: lantiq_gswip: fix and improve the unsupported interface error" is now upstream in torvalds/linux/4d3da2d8d91f66988a829a18a0ce59945e8ae4fb and was backported to Linux 5.4.49
  • updated the patch "MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names" to the version that is queued in the linux-mips/mips-fixes tree
  • removed the pre-init patch again after the previous discussion about it
  • rebased on top of latest master

@@ -50,7 +50,11 @@ CONFIG_MTD_UBI_BLOCK=y
# CONFIG_MTD_UBI_FASTMAP is not set
# CONFIG_MTD_UBI_GLUEBI is not set
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_NET_DSA=y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please deactivate CONFIG_SWCONFIG this is not needed any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN addition you also have to deatcivte these settings:

# CONFIG_PSB6970_PHY is not set
# CONFIG_RTL8366_SMI is not set
# CONFIG_SWCONFIG is not set

Please also refresh the configuration, this DSA driver also activates PHYLINK.

make kernel_oldconfig CONFIG_TARGET=subtarget

@xdarklight
Copy link
Contributor Author

PR update based on review comments from @hauke (thank you!):

  • fixed TP-Link TD-W8970 LAN port mapping
    • I also double-checked the other board.dts again and I think that was the only issue
  • fixed ucidef_set_interface_lan typo
  • moved CONFIG_SWCONFIG to all subtarget configs except xrx200
  • moved CONFIG_PSB6970_PHY, CONFIG_RTL8366RB_PHY and CONFIG_RTL8366_SMI to the xway and xway_legacy subtargets
  • run make kernel_oldconfig CONFIG_TARGET=subtarget

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 73b0a26 to c7bfa72 Compare July 5, 2020 21:08
@@ -0,0 +1,5 @@
set_preinit_iface() {
ifname=eth0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should also not be needed after this patch:
https://patchwork.ozlabs.org/project/openwrt/list/?series=249780

Could someone please try this.

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from f88f785 to 89fe331 Compare June 20, 2021 09:00
@xdarklight
Copy link
Contributor Author

xdarklight commented Jun 20, 2021

today's update includes:

also as a side-note: it seems that @abajk's bugfixes will be included in Linux 5.4.128 and 5.10.46

@Notupus
Copy link

Notupus commented Jun 20, 2021

Tested-by: Notupus notpp46@googlemail.com # TD-W9980/DM200/FRITZ 7430

@sch-m
Copy link
Contributor

sch-m commented Jun 21, 2021

Tested-by: Martin Schiller ms@dev.tdt.de # tested on TDT VR2020
Tested-by: Martin Schiller ms@dev.tdt.de # tested on TP-Link TD-W8980B
Tested-by: Martin Schiller ms@dev.tdt.de # tested on ZyXEL P-2812HNU-F1

@kestrel1974
Copy link
Contributor

Tested-by: Daniel Kestrel kestrel1974@t-online.de # tested on Fritzbox 7490
Tested-by: Daniel Kestrel kestrel1974@t-online.de # tested on Fritzbox 3490

@jospezial
Copy link

Tested-by: @jospezial jospezial@gmx.de # tested on VGV7510KW22 (o2 Box 6431)
Btw, when I opened https://192.168.1.1/cgi-bin/luci/admin/network/network it made migrating changes to my /etc/config/network.

@xdarklight
@hauke has committed his patches to openwrt master.
So the commits of this PR can be reduced again.

@jospezial
Copy link

jospezial commented Jun 23, 2021

also as a side-note: it seems that @abajk's bugfixes will be included in Linux 5.4.128 and 5.10.46

They are.

There they are:
#4254
#4281

@xdarklight
Copy link
Contributor Author

today's update includes:

  • rebased on top of today's master (meaning: @hauke's patches are dropped from this pull request as they're in master now)
  • included the Tested-by from @Notupus, @sch-m, @kestrel1974 and @jospezial - thanks to all of you!

also as a side-note: it seems that @abajk's bugfixes will be included in Linux 5.4.128 and 5.10.46

They are.

great, thanks for checking! I don't have any time to work on this pull request on Friday and Saturday so on Sunday I'll rebase it again - then I hope that this can be merged.

@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 89fe331 to 8b6f6ef Compare June 23, 2021 20:31
@jospezial
Copy link

Typo: Update the existing existing vr9_*.dts{,i}

This backports a fix from Aleksander Jan Bajkowski to TX hangs with
threaded NAPI enabled. So far threaded NAPI is disabled by default but
can be enabled with:
  echo 1 > /sys/class/net/eth0/threaded

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
…iver

This backports another fix from Aleksander Jan Bajkowski which is a
follow-up to a previous memory corruption fix.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Enable the XRX200 PMAC, GSWIP DSA tag and GSIP DSA drivers in the 5.4
kernel config. Update the existing vr9_*.dts{,i} to use the new
Ethernet and switch drivers. Drop the swconfig package from the xrx200
target because swconfig doesn't manage DSA based switches.

The new /etc/config/network format for the DSA driver is not compatible
with the old (swconfig) based one. Show a message during sysupgrade
notifying users about this change and asking them to start with a fresh
config (or forcefully update and then migrate the config manually).

Failsafe mode can now automatically bring up the first lan interface
based on board.json including DSA based setups. Drop
05_set_preinit_iface_lantiq from the xRX200 sub-target as this is not
needed anymore. For now we are keeping it for the ase, xway and
xway_legacy until there's some confirmation that it can be dropped from
there as well.

While here, some boards also receive minor fixups:
- Use LAN1 as LAN1 (according to a photo this port can also be
  configured as WAN) on the Buffalo WBMR-300HPD. This makes it easier to
  read the port mapping because otherwise we would have LAN{2,3,4} and
  WAN (which was the case for the non-DSA version previously).
- vr9_avm_fritz3390.dts: move the "gpio" comment from port 0 and 1 to
  their corresponding PHYs
- vr9_tplink_vr200.dtsi: move the "gpio" comment from port 0 to PHY 0
- vr9_tplink_tdw89x0.dtsi: move the "gpio" comment from port 0 to PHY 0

Acked-by: Aleksander Jan Bajkowski <A.Bajkowski@stud.elka.pw.edu.pl>
Tested-by: Notupus <notpp46@googlemail.com> # TD-W9980/DM200/FRITZ 7430
Tested-by: Martin Schiller <ms@dev.tdt.de> # tested on TDT VR2020
Tested-by: Martin Schiller <ms@dev.tdt.de> # tested on TP-Link TD-W8980B
Tested-by: Martin Schiller <ms@dev.tdt.de> # tested on ZyXEL P-2812HNU-F1
Tested-by: Daniel Kestrel <kestrel1974@t-online.de> # tested on Fritzbox 7490
Tested-by: Daniel Kestrel <kestrel1974@t-online.de> # tested on Fritzbox 3490
Tested-by: @jospezial <jospezial@gmx.de> # tested on VGV7510KW22 (o2 Box 6431)
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
@xdarklight xdarklight force-pushed the lantiq-dsa-20200607 branch from 8b6f6ef to 91cd894 Compare June 24, 2021 18:50
@xdarklight
Copy link
Contributor Author

Typo: Update the existing existing vr9_*.dts{,i}

thanks, fixed

@hauke
Copy link
Member

hauke commented Jun 26, 2021

Thank you for the patch, I applied it to master.
Thank you to all you you working on this finding and fixing bugs and so on.

@ngehrsitz
Copy link

@hauke Thanks for merging this. Do you think this change will be Backported to 21.02?

@xdarklight
Copy link
Contributor Author

@ngrsdhbw just FYI that this question was also raised on the mailing list, see:

kestrel1974 referenced this pull request in kestrel1974/openwrt Jul 25, 2021
-Overlaying board.bin in /lib/firmware/ath10k/QCA988X/hw2.0 should fix the issue that some devices could not bring up ath10k with the wasp image
@jospezial
Copy link

jospezial commented Sep 26, 2021

On my VGV7510KW22 the LAN port where my pc is connected is sometimes hanging after powering off and on the pc. Maybe some bad voltage impulses or ESD. Happens about once in 3 weeks or twice a week. Sporadic.
Port then does not react on plugging, LED on, even without cable. Nothing related in dmesg or logread.
I am using also the PRs for jumbo frames, DMA and lantiq-deu. I can't remember having that problem before DSA.
Anybody else with the same?

@jekkos
Copy link
Contributor

jekkos commented Sep 30, 2021

Upgraded to master with dsa yesterday, after being on a stable pre dsa for multiple months. First day after Ethernet stops working with following message

lantiq,xrx200-net 1e10b308.eth eth0: tx ring full

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
target/lantiq pull request/issue for lantiq target
Projects
None yet
Development

Successfully merging this pull request may close these issues.