Commit graph

104537 commits

Author SHA1 Message Date
Biju Das
896a818e0e ravb: Add gstrings_stats and gstrings_size to struct ravb_hw_info
The device stats strings for R-Car and RZ/G2L are different.

R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".

Add structure variables gstrings_stats and gstrings_size to struct
ravb_hw_info, so that subsequent SoCs can be added without any code
changes in the ravb_get_strings function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:16 +01:00
Biju Das
25154301fc ravb: Add stats_len to struct ravb_hw_info
R-Car provides 30 device stats, whereas RZ/G2L provides only 15. In
addition, RZ/G2L has stats "rx_queue_0_csum_offload_errors" instead of
"rx_queue_0_missed_errors".

Replace RAVB_STATS_LEN macro with a structure variable stats_len to
struct ravb_hw_info, to support subsequent SoCs without any code changes
to the ravb_get_sset_count function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:15 +01:00
Biju Das
cb01c672c2 ravb: Add max_rx_len to struct ravb_hw_info
The maximum descriptor size that can be specified on the reception side for
R-Car is 2048 bytes, whereas for RZ/G2L it is 8096.

Add the max_rx_len variable to struct ravb_hw_info for allocating different
RX skb buffer sizes for R-Car and RZ/G2L using the netdev_alloc_skb
function.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:15 +01:00
Biju Das
68ca3c9232 ravb: Add aligned_tx to struct ravb_hw_info
R-Car Gen2 needs a 4byte aligned address for the transmission buffer,
whereas R-Car Gen3 doesn't have any such restriction.

Add aligned_tx to struct ravb_hw_info to select the driver to choose
between aligned and unaligned tx buffers.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:15 +01:00
Biju Das
ebb091461a ravb: Add struct ravb_hw_info to driver data
The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP. With a few changes in the driver we
can support both IPs.

This patch adds the struct ravb_hw_info to hold hw features, driver data
and function pointers to support both the IPs. It also replaces the driver
data chip type with struct ravb_hw_info by moving chip type to it.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:15 +01:00
Biju Das
cb537b2417 ravb: Use unsigned int for num_tx_desc variable in struct ravb_private
The number of TX descriptors per packet is an unsigned value and
the variable for holding this information should be unsigned.

This patch replaces the data type of num_tx_desc variable in struct
ravb_private from 'int' to 'unsigned int'.
This patch also updates the data type of local variables to unsigned int,
where the local variables are evaluated using num_tx_desc.

Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-19 12:05:15 +01:00
Alexey Dobriyan
39f75da7bc isystem: trim/fixup stdarg.h and other headers
Delete/fixup few includes in anticipation of global -isystem compile
option removal.

Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition
of uintptr_t error (one definition comes from <stddef.h>, another from
<linux/types.h>).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-19 09:02:55 +09:00
Vladimir Oltean
c1930148a3 net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
Currently we are unable to ping a bridge on top of a felix switch which
uses the ocelot-8021q tagger. The packets are dropped on the ingress of
the user port and the 'drop_local' counter increments (the counter which
denotes drops due to no valid destinations).

Dumping the PGID tables, it becomes clear that the PGID_SRC of the user
port is zero, so it has no valid destinations.

But looking at the code, the cpu_fwd_mask (the bit mask of DSA tag_8021q
ports) is clearly missing from the forwarding mask of ports that are
under a bridge. So this has always been broken.

Looking at the version history of the patch, in v7
https://patchwork.kernel.org/project/netdevbpf/patch/20210125220333.1004365-12-olteanv@gmail.com/
the code looked like this:

	/* Standalone ports forward only to DSA tag_8021q CPU ports */
	unsigned long mask = cpu_fwd_mask;

(...)
	} else if (ocelot->bridge_fwd_mask & BIT(port)) {
		mask |= ocelot->bridge_fwd_mask & ~BIT(port);

while in v8 (the merged version)
https://patchwork.kernel.org/project/netdevbpf/patch/20210129010009.3959398-12-olteanv@gmail.com/
it looked like this:

	unsigned long mask;

(...)
	} else if (ocelot->bridge_fwd_mask & BIT(port)) {
		mask = ocelot->bridge_fwd_mask & ~BIT(port);

So the breakage was introduced between v7 and v8 of the patch.

Fixes: e21268efbe ("net: dsa: felix: perform switch setup for tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210817160425.3702809-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:34:52 -07:00
Amey Narkhede
9bdc81ce44 PCI: Change the type of probe argument in reset functions
Change the type of probe argument in functions which implement reset
methods from int to bool to make the context and intent clear.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-18 17:32:42 -05:00
Jason Wang
19b8ece42c net/mlx4: Use ARRAY_SIZE to get an array's size
The ARRAY_SIZE macro is defined to get an array's size which is
more compact and more formal in linux source. Thus, we can replace
the long sizeof(arr)/sizeof(arr[0]) with the compact ARRAY_SIZE.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210817121106.44189-1-wangborong@cdjrlc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-18 15:16:54 -07:00
Arnd Bergmann
5c785014b6 Qualcomm driver updates for v5.15
This fixes the "shared memory state machine" (SMSM) interrupt logic to
 avoid missing transitions happening while the interrupts are masked.
 
 SM6115 support is added to smd-rpm and rpmpd.
 
 The Qualcomm SCM firmware driver is once again made possible to compile
 and load as a kernel module.
 
 An out-of-bounds error related to the cooling devices of the AOSS driver
 is corrected. The binding is converted to YAML and a generic compatible
 is introduced to reduce the driver churn.
 
 The GENI wrapper gains a helper function used in I2C and SPI for
 switching the serial engine hardware to use the wrapper's DMA-engine.
 
 Lastly it contains a number of cleanups and smaller fixes for rpmhpd,
 socinfo, CPR, mdt_loader and the GENI DT binding.
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmEa29wbHGJqb3JuLmFu
 ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3FFnMP/A2Od7JYhdjH7sfc3i3B
 0zws88lT7XTo9KSMpLFrQg1Qzp3ELDMfVhS2CpsekZn1g7s6RGnbQrh2Mac9Yh4z
 +7e4YLhoMxEkdbEVvbVRIX4parFWD/KxUdkyXM8gKZntJzOFl6VY5V8aKi+7IO+/
 CdWHvELDVXMe2LkBd3lKE1AlS2MjkohXpFKgRwkY4r2nVXwqYTdkfJvXdGdhECGr
 ld4ZIvIR6ERLmVpGvTxdU5W0z2xsLTbPYDPSv6mPkWqDYSOyXV3zABV1fSkH28ot
 wIoesHyI3vL4/LNIlHn+tcWj1Ou8hSzxZmxfq7cdKbkfwPLCWE5D8+HEO4kmbFiF
 5Dds+oxvKSFpf/wppK7bUSEd9Q+dKsrFt2mdWy/sYRe1EaEv5sFBgE0rV3+c6ykL
 tptUEFlaB2si7PKSpKje8czHn4Akuc6BwT6xovZZ72K8CNz9D71etSkoNLLXa54d
 bJibw2eNTT1EOACC/FPBO9AS11Icm6wszn/dcaSwaSPGQ6cR3lvAwHqzDFMGHp+x
 L+iojgnZoHykFhQjGuGrI3yTHOpp0MCNxRoN7DlFwm7KLKVHqeqg+xHXtV9sJer8
 iAhY/uepLRxc1oC5Z+Ejx1gABmKycXtzKQ9ecwTclrk66ampWQBlv5+Bxd5w/hux
 ZR96mJPmpk1WKOX3FAgdeaaP
 =88qN
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmEdDA4ACgkQmmx57+YA
 GNmAmw//WfdprWl2SJGfyojhDeeUq6ZeNesMFBabFss7Ar++szhyFSu3bqlyH6WC
 ZgPOZOJakKQq2EGKNm7RuEgeR8sEUs9czetNO8AqVy1szlhqnUKhG0OO+uLqRLQn
 Hvf1fc+uG3xeaFP0/Q/4fuG+GL1fnmXIJWl0nZFUGfWyQ3mxxm5fLq1DM27KAzP1
 eKMqtCh3F0qHMWZTJd084LO2XXyUTVVBbXAQKX/IQmH6+BK3Q2YMToo/919HCnZs
 XfgcdKuB/fOfi1n+PgGNODdTJ/Uy10WkSALZHlzKfr77AUCs9+RWxdkogCmACJBs
 IhNoZ/6D6a6kCIEuaEjHtNsMAVoG0bYpaB9vFLhJgF4wfdCd+DuOXkCy9B7vI4/8
 7/SKArKYrG1sPlhDGFaWZjWEFBCGycDDsHQ4T2ecZ5d3f+Kuimgx7NLYhKRHPI+9
 8QVJwNbIGrNXjwIn6S0AeqDLoXIzMmAbNvuX1lFz0OyEkbgDYPSuUPmKYoKh0VL5
 +aTPYANbKxfF7nIPxfN580yjQGZmJmctyhkqnavEB6HdNySO2/oM5F2uGRVdVKnv
 5HaPLqWZf2PffjAg+bU3O3bBlYRdIpEaKaa1eHFIMTgK7nlcvFESWWtwg+IBKdqY
 JI3Kkg5PP3jxqxybzgFPE298vI1G6/so2A12HBi0dl28XheK+wM=
 =dS58
 -----END PGP SIGNATURE-----

Merge tag 'qcom-drivers-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers

Qualcomm driver updates for v5.15

This fixes the "shared memory state machine" (SMSM) interrupt logic to
avoid missing transitions happening while the interrupts are masked.

SM6115 support is added to smd-rpm and rpmpd.

The Qualcomm SCM firmware driver is once again made possible to compile
and load as a kernel module.

An out-of-bounds error related to the cooling devices of the AOSS driver
is corrected. The binding is converted to YAML and a generic compatible
is introduced to reduce the driver churn.

The GENI wrapper gains a helper function used in I2C and SPI for
switching the serial engine hardware to use the wrapper's DMA-engine.

Lastly it contains a number of cleanups and smaller fixes for rpmhpd,
socinfo, CPR, mdt_loader and the GENI DT binding.

* tag 'qcom-drivers-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  soc: qcom: smsm: Fix missed interrupts if state changes while masked
  soc: qcom: smsm: Implement support for get_irqchip_state
  soc: qcom: mdt_loader: be more informative on errors
  dt-bindings: qcom: geni-se: document iommus
  soc: qcom: smd-rpm: Add SM6115 compatible
  soc: qcom: geni: Add support for gpi dma
  soc: qcom: geni: move GENI_IF_DISABLE_RO to common header
  PM: AVS: qcom-cpr: Use nvmem_cell_read_variable_le_u32()
  drivers: soc: qcom: rpmpd: Add SM6115 RPM Power Domains
  dt-bindings: power: rpmpd: Add SM6115 to rpmpd binding
  dt-bindings: soc: qcom: smd-rpm: Add SM6115 compatible
  soc: qcom: aoss: Fix the out of bound usage of cooling_devs
  firmware: qcom_scm: Allow qcom_scm driver to be loadable as a permenent module
  soc: qcom: socinfo: Don't print anything if nothing found
  soc: qcom: rpmhpd: Use corner in power_off
  soc: qcom: aoss: Add generic compatible
  dt-bindings: soc: qcom: aoss: Convert to YAML
  dt-bindings: soc: qcom: aoss: Add SC8180X and generic compatible
  firmware: qcom_scm: remove a duplicative condition
  firmware: qcom_scm: Mark string array const

Link: https://lore.kernel.org/r/20210816214840.581244-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-18 15:33:02 +02:00
Pavel Skripkin
a786e3195d net: asix: fix uninit value bugs
Syzbot reported uninit-value in asix_mdio_read(). The problem was in
missing error handling. asix_read_cmd() should initialize passed stack
variable smsr, but it can fail in some cases. Then while condidition
checks possibly uninit smsr variable.

Since smsr is uninitialized stack variable, driver can misbehave,
because smsr will be random in case of asix_read_cmd() failure.
Fix it by adding error handling and just continue the loop instead of
checking uninit value.

Added helper function for checking Host_En bit, since wrong loop was used
in 4 functions and there is no need in copy-pasting code parts.

Cc: Robert Foss <robert.foss@collabora.com>
Fixes: d9fe64e511 ("net: asix: Add in_pm parameter")
Reported-by: syzbot+a631ec9e717fb0423053@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 11:46:52 +01:00
Subbaraya Sundeep
ab44035d30 octeontx2-pf: Allow VLAN priority also in ntuple filters
VLAN TCI is a 16 bit field which includes Priority(3 bits),
CFI(1 bit) and VID(12 bits). Currently ntuple filters support
installing rules to steer packets based on VID only.
This patch extends that support such that filters can
be installed for entire VLAN TCI.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 11:29:11 +01:00
Saravana Kannan
7bd0cef5da net: mdio-mux: Handle -EPROBE_DEFER correctly
When registering mdiobus children, if we get an -EPROBE_DEFER, we shouldn't
ignore it and continue registering the rest of the mdiobus children. This
would permanently prevent the deferring child mdiobus from working instead
of reattempting it in the future. So, if a child mdiobus needs to be
reattempted in the future, defer the entire mdio-mux initialization.

This fixes the issue where PHYs sitting under the mdio-mux aren't
initialized correctly if the PHY's interrupt controller is not yet ready
when the mdio-mux is being probed. Additional context in the link below.

Fixes: 0ca2997d14 ("netdev/of/phy: Add MDIO bus multiplexer support.")
Link: https://lore.kernel.org/lkml/CAGETcx95kHrv8wA-O+-JtfH7H9biJEGJtijuPVN0V5dUKUAB3A@mail.gmail.com/#t
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Saravana Kannan
99d81e9424 net: mdio-mux: Don't ignore memory allocation errors
If we are seeing memory allocation errors, don't try to continue
registering child mdiobus devices. It's unlikely they'll succeed.

Fixes: 342fa19644 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Saravana Kannan
663d946af5 net: mdio-mux: Delete unnecessary devm_kfree
The whole point of devm_* APIs is that you don't have to undo them if you
are returning an error that's going to get propagated out of a probe()
function. So delete unnecessary devm_kfree() call in the error return path.

Fixes: b601616681 ("mdio: mux: Correct mdio_mux_init error path issues")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:48:52 +01:00
Vladimir Oltean
994d2cbb08 net: dsa: tag_sja1105: be dsa_loop-safe
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
sure that every time we dereference dp->priv, we check the switch's
dsa_switch_ops (otherwise we access a struct sja1105_port structure that
is in fact something else).

This adds an unconditional build-time dependency between sja1105 being
built as module => tag_sja1105 must also be built as module. This was
there only for PTP before.

Some sane defaults must also take place when not running on sja1105
hardware. These are:

- sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
  depending on VLAN awareness and switch revision (when an encapsulated
  VLAN must be sent). Default to 0x8100.

- sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
  metadata timestamp frames. When running on non-sja1105 hardware, don't
  do that and accept all frames unmodified.

- sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
  which writes a management route over SPI. When not running on sja1105
  hardware, bypass the SPI write and send the frame as-is.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:33:15 +01:00
Vladimir Oltean
ed5d2937a6 net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
It seems that of_find_compatible_node has a weird calling convention in
which it calls of_node_put() on the "from" node argument, instead of
leaving that up to the caller. This comes from the fact that
of_find_compatible_node with a non-NULL "from" argument it only supposed
to be used as the iterator function of for_each_compatible_node(). OF
iterator functions call of_node_get on the next OF node and of_node_put()
on the previous one.

When of_find_compatible_node calls of_node_put, it actually never
expects the refcount to drop to zero, because the call is done under the
atomic devtree_lock context, and when the refcount drops to zero it
triggers a kobject and a sysfs file deletion, which assume blocking
context.

So any driver call to of_find_compatible_node is probably buggy because
an unexpected of_node_put() takes place.

What should be done is to use the of_get_compatible_child() function.

Fixes: 5a8f09748e ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/
Suggested-by: Frank Rowand <frowand.list@gmail.com>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:21:01 +01:00
Wang Hai
1b80fec7b0 ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
In ixgbe_xsk_pool_enable(), if ixgbe_xsk_wakeup() fails,
We should restore the previous state and clean up the
resources. Add the missing clear af_xdp_zc_qps and unmap dma
to fix this bug.

Fixes: d49e286d35 ("ixgbe: add tracking of AF_XDP zero-copy state for each queue pair")
Fixes: 4a9b32f30f ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20210817203736.3529939-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 17:47:52 -07:00
Amey Narkhede
4ec36dfeb1 PCI: Remove reset_fn field from pci_dev
"reset_fn" indicates whether the device supports any reset mechanism.
Remove the use of reset_fn in favor of the reset_methods array that tracks
supported reset mechanisms of a device and their ordering.

The octeon driver incorrectly used reset_fn to detect whether the device
supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR.

Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17 17:44:38 -05:00
Jakub Kicinski
e5e487a2ec wireless-drivers fixes for v5.14
First set of fixes for v5.14 and nothing major this time. New devices
 for iwlwifi and one fix for a compiler warning.
 
 iwlwifi
 
 * support for new devices
 
 mt76
 
 * fix compiler warning about MT_CIPHER_NONE
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCgAzFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmEb7SAVHGt2YWxvQGNv
 ZGVhdXJvcmEub3JnAAoJEG4XJFUm622bDaEH/1TH5IlBXq4dNPGNwgZ0L4sXFlVk
 U7mSbuv2BdH+5edozGwdGl4d6o/vqzcoYSF/dh6d4rpX//zacSsB4LpACW1knlh5
 aJxjP5PdxLty/90JXtpBxL79WfQ9kVQ72ldKAg1Gk9XFk1UOqXSaOaLccNBtFk78
 n97hNwEeKKX1bw//fNLgyxUAlMoIVCaNjtcY9xJpoC5xLHQxM7ixhxqZF7XSeujQ
 z63CRUnT/7gFr4DbOLsZSZVYhCX9v+rz4imIsNbly3e6vLH9Mp2pkyHFfaHKFk6X
 tV/Kkd1Bq6OjQAGSq7mbddi7XqXSd3/1rUZChUy0ZSiyKlly52iqxWhWBBI=
 =YqJK
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-2021-08-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.14

First set of fixes for v5.14 and nothing major this time. New devices
for iwlwifi and one fix for a compiler warning.

iwlwifi
 * support for new devices

mt76
 * fix compiler warning about MT_CIPHER_NONE

* tag 'wireless-drivers-2021-08-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
  mt76: fix enum type mismatch
  iwlwifi: add new so-jf devices
  iwlwifi: add new SoF with JF devices
  iwlwifi: pnvm: accept multiple HW-type TLVs
====================

Link: https://lore.kernel.org/r/20210817171027.EC1E6C43460@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 15:08:14 -07:00
Colin Ian King
6e9078a667 i40e: Fix spelling mistake "dissable" -> "disable"
There is a spelling mistake in a dev_info message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-17 10:35:01 -07:00
Stefan Assmann
5ac49f3c27 iavf: use mutexes for locking of critical sections
As follow-up to the discussion with Jakub Kicinski about iavf locking
being insufficient [1] convert iavf to use mutexes instead of bitops.
The locking logic is kept as is, just a drop-in replacement of
enum iavf_critical_section_t with separate mutexes.
The only difference is that the mutexes will be destroyed before the
module is unloaded.

[1] https://lwn.net/ml/netdev/20210316150210.00007249%40intel.com/

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-08-17 09:45:45 -07:00
Dinghao Liu
0a298d1338 net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
qlcnic_83xx_unlock_flash() is called on all paths after we call
qlcnic_83xx_lock_flash(), except for one error path on failure
of QLCRD32(), which may cause a deadlock. This bug is suggested
by a static analysis tool, please advise.

Fixes: 81d0aeb0a4 ("qlcnic: flash template based firmware reset recovery")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20210816131405.24024-1-dinghao.liu@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17 08:27:31 -07:00
Jason Wang
dbcf24d153 virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO
Commit a02e8964ea ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.

This leads to several issues:

- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc slowing
  down the traffic.

Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964ea
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964ea ("virtio-net: ethtool configurable LRO")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Ivan <ivan@prestigetransportation.com>
Tested-by: Ivan <ivan@prestigetransportation.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:45:09 +01:00
Vidya
aee5122491 octeontx2-af: configure npc for cn10k to allow packets from cpt
On CN10K, the higher bits in the channel number represents the CPT
channel number. Mask out these higher bits in the npc configuration
to allow packets from cpt for parsing.

Signed-off-by: Vidya <vvelumuri@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Hariprasad Kelam
99b8e5479d octeontx2-af: cn10K: Get NPC counters value
The way SW can identify the number NPC counters supported by silicon
has changed for CN10K. This patch addresses this reading appropriate
registers to find out number of counters available.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Subbaraya Sundeep
7df5b4b260 octeontx2-af: Allocate low priority entries for PF
If the mcam entry allocation request is from PF
and NOT a priority allocation request then allocate
low priority entries so that PF entries always have
lower priority than its VFs. This is required so
that entries with (base) MCAM match criteria have lower
priority compared to entries with (base + additional)
match criteria. This patch considers only best case
scenario where PF entries are allocated from low
priority zone if low priority zone has free space.
There are worst case scenarios like:
1. VFs allocating hundreds of MCAM entries leading to VFs
using all mid priority zone and low priority zone entries
hence no entries free from low priority zone for PF.
2. All the PFs and VFs in the system allocating and freeing
entries causing fragmentation in MCAM space and all the
entries requested by PF could not fit in low priority
zone for allocation.
This patch do not handle worst case scenarios.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Sunil Goutham
2da4894327 octeontx2-pf: devlink params support to set mcam entry count
Added support for setting or modifying MCAM entry count at
runtime via devlink params.

commands:
  devlink dev param show
pci/0002:02:00.0:
  name mcam_count type driver-specific
    values:
      cmode runtime value 16

  devlink dev param set pci/0002:02:00.0 name mcam_count
				value 64 cmode runtime

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Sunil Goutham
2e2a8126ff octeontx2-pf: Unify flow management variables
Variables used for TC flow management like maximum number
of flows, number of flows installed etc are a copy of ntuple
flow management variables. Since both TC and NTUPLE are not
supported at the same time, it's better to unify these with
common variables.

This patch addresses this unification and also does cleanup of
other minor stuff wrt TC.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Sunil Goutham
cc65fcab88 octeontx2-pf: Sort the allocated MCAM entry indices
Per single mailbox request a maximum of 256 MCAM entries
can be allocated. If more than 256 are being allocated, then
the mcam indices in the final list could get jumbled. Hence
sort the indices.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Rakesh Babu
3cffaed213 octeontx2-pf: Ntuple filters support for VF netdev
Add packet flow classification support for both LMAC mapped virtual
functions and loopback VFs. This patch adds supports for ntuple
offload feature.

Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:33 +01:00
Sunil Goutham
0b3834aeaf octeontx2-pf: Enable NETIF_F_RXALL support for VF driver
Enabled NETIF_F_RXALL support for VF driver.
Also removed MTU range comments which are no longer valid.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:32 +01:00
Sunil Goutham
a83bdada06 octeontx2-af: Add debug messages for failures
Added debug messages for various failures during probe.
This will help in quickly identifying the API where the failure
is happening.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:32 +01:00
Naveen Mamindlapalli
7278c359e5 octeontx2-af: add proper return codes for AF mailbox handlers
Add appropriate error codes to be used when returning from AF
mailbox handlers due to some error condition.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:32 +01:00
Subbaraya Sundeep
9cfc580956 octeontx2-af: Modify install flow error codes
When installing a flow using npc_install_flow
mailbox there are number of reasons to reject
the request like caller is not permitted,
invalid channel specified in request, flow
not supported in extraction profile and so on.
Hence define new error codes for npc flows and use
them instead of generic error codes.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 10:06:32 +01:00
David S. Miller
354e1f9d88 mlx5-updates-2021-08-16
The following patchset provides two separate mlx5 updates
 1) Ethtool RSS context and MQPRIO channel mode support:
   1.1) enable mlx5e netdev driver to allow creating Transport Interface RX
        (TIRs) objects on the fly to be used for ethtool RSS contexts and
        TX MQPRIO channel mode
   1.2) Introduce mlx5e_rss object to manage such TIRs.
   1.3) Ethtool support for RSS context
   1.4) Support MQPRIO channel mode
 
 2) Bridge offloads Lag support:
    to allow adding bond net devices to mlx5 bridge
   2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair
        since vport_num is only unique per eswitch and in lag mode we
        need to manage ports from both eswitches.
   2.2) Allow connectivity between representors of different eswitch
        instances that are attached to same bridge
   2.3) Bridge LAG, Require representors to be in shared FDB mode and
        introduce local and peer ports representors,
        match on paired eswitch metadata in peer FDB entries,
        And finally support addition/deletion and aging of peer flows.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmEa8gwACgkQSD+KveBX
 +j56lAf/esRKDA2kKdtYr7AUNCDmRf/Fj5jBRuASuEWOGgoBaGNZprBw7So5MZdc
 kZpkTnrvFZ5KXsq8nfCJsFkmepHMfJYmTc4VWXpMOyxYlsTwQc9UQ3MBboylMYqL
 23KJKxurFN0kyMIOGciqXaetHgIhf+iQx1osJZd4WGk1soiWX7JiDV6gXXEZu4Ge
 VH261CwOCLwpJ4STTMdjgcGnARStMcr0I3LJm7BWoMVls6FcLXcNslhvcRYYyUiX
 B5UXYrZPLA39CegL/y/jP1vdH6pStb453x1dtjIoJiM+iOIjTaGkYOLF6w2QQWqd
 /8R4MPeAYJ4ENsWmrEbPG+F/5GIN1Q==
 =KWry
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-08-16

The following patchset provides two separate mlx5 updates
1) Ethtool RSS context and MQPRIO channel mode support:
  1.1) enable mlx5e netdev driver to allow creating Transport Interface RX
       (TIRs) objects on the fly to be used for ethtool RSS contexts and
       TX MQPRIO channel mode
  1.2) Introduce mlx5e_rss object to manage such TIRs.
  1.3) Ethtool support for RSS context
  1.4) Support MQPRIO channel mode

2) Bridge offloads Lag support:
   to allow adding bond net devices to mlx5 bridge
  2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair
       since vport_num is only unique per eswitch and in lag mode we
       need to manage ports from both eswitches.
  2.2) Allow connectivity between representors of different eswitch
       instances that are attached to same bridge
  2.3) Bridge LAG, Require representors to be in shared FDB mode and
       introduce local and peer ports representors,
       match on paired eswitch metadata in peer FDB entries,
       And finally support addition/deletion and aging of peer flows.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17 09:51:19 +01:00
Lahav Schlesinger
09e856d54b vrf: Reset skb conntrack connection on VRF rcv
To fix the "reverse-NAT" for replies.

When a packet is sent over a VRF, the POST_ROUTING hooks are called
twice: Once from the VRF interface, and once from the "actual"
interface the packet will be sent from:
1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
     This causes the POST_ROUTING hooks to run.
2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.

Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
second vrf_l3_rcv() calls them again.

As an example, consider the following SNAT rule:
> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1

In this case sending over a VRF will create 2 conntrack entries.
The first is from the VRF interface, which performs the IP SNAT.
The second will run the SNAT, but since the "expected reply" will remain
the same, conntrack randomizes the source port of the packet:
e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
rules are:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
SNAT-ed from 10000 --> 61033.

But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
conntrack entry is matched:
udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1

And a "port 61033 unreachable" ICMP packet is sent back.

The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
the skb already has a conntrack flow attached to it, which means
nf_conntrack_in() will not resolve the flow again.

This means only the dest port is "reverse-NATed" (61033 -> 10000) but
the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
not received.
This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.

The fix is then to reset the flow when skb is received on a VRF, to let
conntrack resolve the flow again (which now will hit the earlier flow).

To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
  running 'bash ./repro'):
  $ cat run_in_A1.py
  import logging
  logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
  from scapy.all import *
  import argparse

  def get_packet_to_send(udp_dst_port, msg_name):
      return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
          IP(src='3.3.3.3', dst='2.2.2.2')/ \
          UDP(sport=53, dport=udp_dst_port)/ \
          Raw(f'{msg_name}\x0012345678901234567890')

  parser = argparse.ArgumentParser()
  parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
                      help="From run_in_A3.py")
  parser.add_argument('-socket_port', dest="socket_port", type=str,
                      required=True, help="From run_in_A3.py")
  parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
                      help="From script")

  args, _ = parser.parse_known_args()
  iface_mac = args.iface_mac
  socket_port = int(args.socket_port)
  v1_mac = args.v1_mac

  print(f'Source port before NAT: {socket_port}')

  while True:
      pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
      if 0 == len(pkts):
          print('Something failed, rerun the script :(', flush=True)
          break
      pkt = pkts[0]
      if not pkt.haslayer('UDP'):
          continue

      pkt_sport = pkt.getlayer('UDP').sport
      print(f'Source port after NAT: {pkt_sport}', flush=True)

      pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
      sendp(pkt_to_send, '_v0', verbose=False) # Will not be received

      pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
      sendp(pkt_to_send, '_v0', verbose=False)
      break

  $ cat run_in_A2.py
  import socket
  import netifaces

  print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
        flush=True)
  s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
               str('vrf_1' + '\0').encode('utf-8'))
  s.connect(('3.3.3.3', 53))
  print(f'{s. getsockname()[1]}', flush=True)
  s.settimeout(5)

  while True:
      try:
          # Periodically send in order to keep the conntrack entry alive.
          s.send(b'a'*40)
          resp = s.recvfrom(1024)
          msg_name = resp[0].decode('utf-8').split('\0')[0]
          print(f"Got {msg_name}", flush=True)
      except Exception as e:
          pass

  $ cat repro.sh
  ip netns del A1 2> /dev/null
  ip netns del A2 2> /dev/null
  ip netns add A1
  ip netns add A2

  ip -n A1 link add _v0 type veth peer name _v1 netns A2
  ip -n A1 link set _v0 up

  ip -n A2 link add e00000 type bond
  ip -n A2 link add lo0 type dummy
  ip -n A2 link add vrf_1 type vrf table 10001
  ip -n A2 link set vrf_1 up
  ip -n A2 link set e00000 master vrf_1

  ip -n A2 addr add 1.1.1.1/24 dev e00000
  ip -n A2 link set e00000 up
  ip -n A2 link set _v1 master e00000
  ip -n A2 link set _v1 up
  ip -n A2 link set lo0 up
  ip -n A2 addr add 2.2.2.2/32 dev lo0

  ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
  ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001

  ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
	SNAT --to-source 2.2.2.2 -o vrf_1

  sleep 5
  ip netns exec A2 python3 run_in_A2.py > x &
  XPID=$!
  sleep 5

  IFACE_MAC=`sed -n 1p x`
  SOCKET_PORT=`sed -n 2p x`
  V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
  ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
          ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
  sleep 5

  kill -9 $XPID
  wait $XPID 2> /dev/null
  ip netns del A1
  ip netns del A2
  tail x -n 2
  rm x
  set +x

Fixes: 73e20b761a ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16 16:37:01 -07:00
Vlad Buslov
ff9b752146 net/mlx5: Bridge, support LAG
Allow adding bond net devices to mlx5 bridge with following changes:

- Modify bridge representor code to obtain uplink represetor that belongs
to eswitch that is registered for notification. Require representor to be
in shared FDB mode. If representor is the lag master, then consider its
port as local, otherwise treat it as peer.

- Use devcom to match on paired eswitch metadata in peer FDB entries. This
is necessary for shared FDB LAG to function since packets are always
received on active eswitch instance as opposed to parent eswitch of port.

- Support for deleting peer flows when receiving
SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous
patches in series. Now also implement support for handling
SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update
workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp
to current jiffies when receiving such notification on eswitch that manages
the local FDB entry. This allows peer entries to prevent ageing of the FDB.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:32 -07:00
Vlad Buslov
c358ea1741 net/mlx5: Bridge, allow merged eswitch connectivity
Allow connectivity between representors of different eswitch instances that
are attached to same bridge when merged_eswitch capability is enabled. Add
ports of peer eswitch to bridge instance and mark them with
MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with
MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their
local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event.
Listen to the event on mlx5 bridge implementation and delete peer FDBs in
event handler.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:31 -07:00
Vlad Buslov
bf3d56d8f5 net/mlx5: Bridge, extract FDB delete notification to function
SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in
bridge code. Following patch in series changes the condition for the
notification. Extract the notification into dedicated helper function
mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the
future changes.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:31 -07:00
Vlad Buslov
3ee6233e61 net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair
Following patches in series allow traffic between vports of different
eswitch instances, which requires addressing bridge port by
vport_num+esw_owner_vhca_id pair since vport_num is only unique
per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with
'esw_owner_vhca_id' field and use it as part of key for
mlx5_esw_bridge->vports xarray.

With this change we can't rely on switchdev_handle_port_obj_add() helper to
get mlx5 representor from stacked device because we need specifically
representor from parent eswitch that registered the callback to obtain
correct esw_owner_vhca_id. The helper doesn't allow passing additional
parameters to predicate function and doesn't provide access to the notifier
block to obtain eswitch through br_offloads. Implement custom helpers to
obtain mlx5 representor and use them in
mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations.

Remove direct pointer to parent bridge from struct mlx5_vport as it is no
longer needed.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Vlad Buslov
a514d17350 net/mlx5: Bridge, obtain core device from eswitch instead of priv
Following patches in series will pass bond device to bridge, which means
the code can't assume the device is mlx5 representor. Moreover, the core
device can be easily obtained from eswitch instance, so there is no reason
for more complex code that obtains struct mlx5_priv from net_device in
order to use its mdev. Refactor the code to use esw->dev instead of
priv->mdev.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Vlad Buslov
4de20e9a12 net/mlx5: Bridge, release bridge in same function where it is taken
Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if
mlx5_esw_bridge_vport_init() returned an error instead of relying on it to
release the bridge. This improves the design because object instance is
taken and released in same layer and simplifies following patches that add
more logic to mlx5_esw_bridge_vport_link().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:30 -07:00
Tariq Toukan
ec60c4581b net/mlx5e: Support MQPRIO channel mode
Add support for MQPRIO channel mode, in which a partition to TCs
is defined over the channels. We allow partitions with contiguous
queue indices, with no holes within. We do not allow modification
to the num of channels while this MQPRIO mode is active.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
21ecfcb83a net/mlx5e: Handle errors of netdev_set_num_tc()
Add handling for failures in netdev_set_num_tc().
Let mlx5e_netdev_set_tcs return an int.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
e2aeac448f net/mlx5e: Maintain MQPRIO mode parameter
This is in preparation for supporting MQPRIO CHANNEL mode in
downstream patch, in addition to DCB mode that's supported today.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:29 -07:00
Tariq Toukan
86d747a3f9 net/mlx5e: Abstract MQPRIO params
Abstract the MQPRIO params into a struct.
Use a getter for DCB mode num_tcs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00
Tariq Toukan
248d3b4c9a net/mlx5e: Support flow classification into RSS contexts
Extend the existing flow classification support, to steer
flows not only directly to a receive ring, but also into
the new RSS contexts.

Create needed TIR objects on demand, and hold reference
on the RSS context.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00
Tariq Toukan
f01cc58c18 net/mlx5e: Support multiple RSS contexts
Add support to multiple RSS contexts. Resources of the non-default
RSS contexts are allocated and created on demand. Each RSS context
can be controlled and configured separately, via the implemented
ethtool ops. Here we limit the num of total contexts to 16.

We do not enforce any kind of new limitation over the indirection table
content. More specifically, two separate contexts can be configured to
fully or partially point to the same set of receive rings.

The default RSS context (index 0) is created with its full set of TIRs.
All other contexts are created with an empty set, then TIRs are added
upon first usage when steering rules are added.
We use a reference counting mechanism to make sure an RSS context is
not removed before the rules pointing to it.

Block ethtool set_channels operations when multiple RSS contexts exist,
as currently the kernel doesn't protect against inconsistent channels
configs that break non-default RSS contexts.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16 16:17:28 -07:00