Commit graph

903632 commits

Author SHA1 Message Date
David S. Miller
764e55824d Merge branch 'net-stmmac-Use-readl_poll_timeout-to-simplify-the-code'
Dejin Zheng says:

====================
net: stmmac: Use readl_poll_timeout() to simplify the code

This patch sets just for replace the open-coded loop to the
readl_poll_timeout() helper macro for simplify the code in
stmmac driver.

v2 -> v3:
	- return whatever error code by readl_poll_timeout() returned.
v1 -> v2:
	- no changed. I am a newbie and sent this patch a month
	  ago (February 6th). So far, I have not received any comments or
	  suggestion. I think it may be lost somewhere in the world, so
	  resend it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 02:10:10 -07:00
Dejin Zheng
45d0da498e net: stmmac: use readl_poll_timeout() function in dwmac4_dma_reset()
The dwmac4_dma_reset() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 02:10:09 -07:00
Dejin Zheng
ff8ed73786 net: stmmac: use readl_poll_timeout() function in init_systime()
The init_systime() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 02:10:09 -07:00
YueHaibing
a1dd3875fd chcr: remove set but not used variable 'status'
drivers/crypto/chelsio/chcr_ktls.c: In function chcr_ktls_cpl_set_tcb_rpl:
drivers/crypto/chelsio/chcr_ktls.c:662:11: warning:
 variable status set but not used [-Wunused-but-set-variable]

commit 8a30923e15 ("cxgb4/chcr: Save tx keys and handle HW response")
involved this unused variable, remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 02:02:04 -07:00
Era Mayflower
48ef50fa86 macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)
Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.

Added support in:
    * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
    * Setting and getting 64bit packet numbers with of SAs.
    * Setting (only on SA creation) and getting ssci of SAs.
    * Setting salt when installing a SAK.

Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
    * MACSEC_CIPHER_ID_GCM_AES_XPN_128
    * MACSEC_CIPHER_ID_GCM_AES_XPN_256

In addition, added 2 new netlink attribute types:
    * MACSEC_SA_ATTR_SSCI
    * MACSEC_SA_ATTR_SALT

Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.

Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 01:42:31 -07:00
Era Mayflower
a21ecf0e03 macsec: Support XPN frame handling - IEEE 802.1AEbw
Support extended packet number cipher suites (802.1AEbw) frames handling.
This does not include the needed netlink patches.

    * Added xpn boolean field to `struct macsec_secy`.
    * Added ssci field to `struct_macsec_tx_sa` (802.1AE figure 10-5).
    * Added ssci field to `struct_macsec_rx_sa` (802.1AE figure 10-5).
    * Added salt field to `struct macsec_key` (802.1AE 10.7 NOTE 1).
    * Created pn_t type for easy access to lower and upper halves.
    * Created salt_t type for easy access to the "ssci" and "pn" parts.
    * Created `macsec_fill_iv_xpn` function to create IV in XPN mode.
    * Support in PN recovery and preliminary replay check in XPN mode.

In addition, according to IEEE 802.1AEbw figure 10-5, the PN of incoming
frame can be 0 when XPN cipher suite is used, so fixed the function
`macsec_validate_skb` to fail on PN=0 only if XPN is off.

Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-16 01:42:31 -07:00
Sukumar Ghorai
905d7b1311 Bluetooth: btusb: print Intel fw build version in power-on boot
To determine the build version of Bluetooth firmware to ensure reported
issue related to a particular release. This is very helpful for every fw
downloaded to BT controller and issue reported from field test.

Signed-off-by: Amit K Bag <amit.k.bag@intel.com>
Signed-off-by: Sukumar Ghorai <sukumar.ghorai@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-16 08:57:40 +01:00
David S. Miller
65b7a2c8e3 Merge branch 'net-dsa-improve-serdes-integration'
Russell King says:

====================
net: dsa: improve serdes integration

Depends on "net: mii clause 37 helpers".

Andrew Lunn mentioned that the Serdes PCS found in Marvell DSA switches
does not automatically update the switch MACs with the link parameters.
Currently, the DSA code implements a work-around for this.

This series improves the Serdes integration, making use of the recent
phylink changes to support split MAC/PCS setups.  One noticable
improvement for userspace is that ethtool can now report the link
partner's advertisement.

This repost has no changes compared to the previous posting; however,
the regression Andrew had found which exists even without this patch
set has now been fixed by Andrew and merged into the net-next tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:13 -07:00
Russell King
5d5b231da7 net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down
Use the status of the PHY_DETECT bit to determine whether we need to
force the MAC settings in mac_link_up() and mac_link_down().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
dc745ece3b net: dsa: mv88e6xxx: remove port_link_state functions
The port_link_state method is only used by mv88e6xxx_port_setup_mac(),
which is now only called during port setup, rather than also being
called via phylink's mac_config method.

Remove this now unnecessary optimisation, which allows us to remove the
port_link_state methods as well.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
f365c6f723 net: dsa: mv88e6xxx: combine port_set_speed and port_set_duplex
Setting the speed independently of duplex makes little sense; the two
parameters result from negotiation or fixed setup, and may have inter-
dependencies. Moreover, they are always controlled via the same
register - having them split means we have to read-modify-write this
register twice.

Combine the two operations into a single port_set_speed_duplex()
operation. Not only is this more efficient, it reduces the size of the
code as well.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
7e0e624312 net: dsa: mv88e6xxx: fix Serdes link changes
phylink_mac_change() is supposed to be called with a 'false' argument
if the link has gone down since it was last reported up; this is to
ensure that link events along with renegotiation events are always
correctly reported to userspace.

Read the BMSR once when we have an interrupt, and report the link
latched status to phylink via phylink_mac_change().  phylink will deal
automatically with re-reading the link state once it has processed the
link-down event.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
a5a6858b79 net: dsa: mv88e6xxx: extend phylink to Serdes PHYs
Extend the mv88e6xxx phylink implementation down to Serdes PHYs, which
handle the PCS layer of such links.

- Implement phylink PCS link state reading, so that we can provide
  ethtool with the linkmodes and link speed in the expected manner.
  Note: this will only be called for in-band negotiation, which is
  only supported by the serdes interfaces.
- Implement phylink PCS configuration, so that the in-band AN and
  advertisement can be configured.
- Implement phylink PCS negotiation restart, so that the in-band AN
  can be restarted.
- Implement phylink PCS link up, so that when operating out-of-band,
  the Serdes can be configured for the appropriate fixed speed mode.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
64d47d50be net: dsa: mv88e6xxx: configure interface settings in mac_config
Only configure the interface settings in mac_config(), leaving the
speed and duplex settings to mac_link_up to deal with.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
4c8b7350a6 net: dsa: mv88e6xxx: use BMCR definitions for serdes control register
The SGMII/1000base-X serdes register set is a clause 22 register set
offset at 0x2000 in the PHYXS device. Rather than inventing our own
defintions, use those that already exist, and name the register
MV88E6390_SGMII_BMCR.  Also remove the unused MV88E6390_SGMII_STATUS
definitions.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
Russell King
87615c96e7 net: dsa: warn if phylink_mac_link_state returns error
Issue a warning to the kernel log if phylink_mac_link_state() returns
an error. This should not occur, but let's make it visible.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:11:12 -07:00
David S. Miller
3c04d3570b Merge branch 'net-mii-clause-37-helpers'
Russell King says:

====================
net: mii clause 37 helpers

This is a re-post of two patches that are common to two series that
I've sent in recent weeks; I'm re-posting them separately in the hope
that they can be merged.  No changes from either of the previous
postings.

These patches:

1. convert the existing (unused) mii_lpa_to_ethtool_lpa_x() function
   to a linkmode variant.

2. add a helper for clause 37 advertisements, supporting both the
   1000baseX and defacto 2500baseX variants. Note that ethtool does
   not support half duplex for either of these, and we make no effort
   to do so.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:10:14 -07:00
Russell King
a9f28eba6e net: mii: add linkmode_adv_to_mii_adv_x()
Add a helper to convert a linkmode advertisement to a clause 37
advertisement value for 1000base-x and 2500base-x.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:10:14 -07:00
Russell King
f655418785 net: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variant
Add a LPA to linkmode decoder for 1000BASE-X protocols; this decoder
only provides the modify semantics similar to other such decoders.
This replaces the unused mii_lpa_to_ethtool_lpa_x() helper.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:10:14 -07:00
Florian Westphal
d0febd81ae netfilter: conntrack: re-visit sysctls in unprivileged namespaces
since commit b884fa4617 ("netfilter: conntrack: unify sysctl handling")
conntrack no longer exposes most of its sysctls (e.g. tcp timeouts
settings) to network namespaces that are not owned by the initial user
namespace.

This patch exposes all sysctls even if the namespace is unpriviliged.

compared to a 4.19 kernel, the newly visible and writeable sysctls are:
  net.netfilter.nf_conntrack_acct
  net.netfilter.nf_conntrack_timestamp
  .. to allow to enable accouting and timestamp extensions.

  net.netfilter.nf_conntrack_events
  .. to turn off conntrack event notifications.

  net.netfilter.nf_conntrack_checksum
  .. to disable checksum validation.

  net.netfilter.nf_conntrack_log_invalid
  .. to enable logging of packets deemed invalid by conntrack.

newly visible sysctls that are only exported as read-only:

  net.netfilter.nf_conntrack_count
  .. current number of conntrack entries living in this netns.

  net.netfilter.nf_conntrack_max
  .. global upperlimit (maximum size of the table).

  net.netfilter.nf_conntrack_buckets
  .. size of the conntrack table (hash buckets).

  net.netfilter.nf_conntrack_expect_max
  .. maximum number of permitted expectations in this netns.

  net.netfilter.nf_conntrack_helper
  .. conntrack helper auto assignment.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:51 +01:00
Pablo Neira Ayuso
339706bc21 netfilter: nft_lookup: update element stateful expression
If the set element comes with an stateful expression, update it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:50 +01:00
Pablo Neira Ayuso
76adfafeca netfilter: nf_tables: add nft_set_elem_update_expr() helper function
This helper function runs the eval path of the stateful expression
of an existing set element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:49 +01:00
Pablo Neira Ayuso
4094445229 netfilter: nf_tables: add elements with stateful expressions
Update nft_add_set_elem() to handle the NFTA_SET_ELEM_EXPR netlink
attribute. This patch allows users to to add elements with stateful
expressions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:49 +01:00
Pablo Neira Ayuso
795a6d6b42 netfilter: nf_tables: statify nft_expr_init()
Not exposed anymore to modules, statify this function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:48 +01:00
Pablo Neira Ayuso
a7fc936804 netfilter: nf_tables: add nft_set_elem_expr_alloc()
Add helper function to create stateful expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:47 +01:00
Stefano Brivio
eb16933aa5 nft_set_pipapo: Prepare for single ranged field usage
A few adjustments in nft_pipapo_init() are needed to allow usage of
this set back-end for a single, ranged field.

Provide a convenient NFT_PIPAPO_MIN_FIELDS definition that currently
makes sure that the rbtree back-end is selected instead, for sets
with a single field.

This finally allows a fair comparison with rbtree sets, by defining
NFT_PIPAPO_MIN_FIELDS as 0 and skipping rbtree back-end initialisation:

 ---------------.--------------------------.-------------------------.
 AMD Epyc 7402  |      baselines, Mpps     |   Mpps, % over rbtree   |
  1 thread      |__________________________|_________________________|
  3.35GHz       |        |        |        |            |            |
  768KiB L1D$   | netdev |  hash  | rbtree |            |   pipapo   |
 ---------------|  hook  |   no   | single |   pipapo   |single field|
 type   entries |  drop  | ranges | field  |single field|    AVX2    |
 ---------------|--------|--------|--------|------------|------------|
 net,port       |        |        |        |            |            |
          1000  |   19.0 |   10.4 |    3.8 | 6.0   +58% | 9.6  +153% |
 ---------------|--------|--------|--------|------------|------------|
 port,net       |        |        |        |            |            |
           100  |   18.8 |   10.3 |    5.8 | 9.1   +57% |11.6  +100% |
 ---------------|--------|--------|--------|------------|------------|
 net6,port      |        |        |        |            |            |
          1000  |   16.4 |    7.6 |    1.8 | 2.8   +55% | 6.5  +261% |
 ---------------|--------|--------|--------|------------|------------|
 port,proto     |        |        |        |     [1]    |    [1]     |
         30000  |   19.6 |   11.6 |    3.9 | 0.9   -77% | 2.7   -31% |
 ---------------|--------|--------|--------|------------|------------|
 port,proto     |        |        |        |            |            |
         10000  |   19.6 |   11.6 |    4.4 | 2.1   -52% | 5.6   +27% |
 ---------------|--------|--------|--------|------------|------------|
 port,proto     |        |        |        |            |            |
 4 threads 10000|   77.9 |   45.1 |   17.4 | 8.3   -52% |22.4   +29% |
 ---------------|--------|--------|--------|------------|------------|
 net6,port,mac  |        |        |        |            |            |
            10  |   16.5 |    5.4 |    4.3 | 4.5    +5% | 8.2   +91% |
 ---------------|--------|--------|--------|------------|------------|
 net6,port,mac, |        |        |        |            |            |
 proto    1000  |   16.5 |    5.7 |    1.9 | 2.8   +47% | 6.6  +247% |
 ---------------|--------|--------|--------|------------|------------|
 net,mac        |        |        |        |            |            |
          1000  |   19.0 |    8.4 |    3.9 | 6.0   +54% | 9.9  +154% |
 ---------------'--------'--------'--------'------------'------------'
 [1] Causes switch of lookup table buckets for 'port' to 4-bit groups

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:46 +01:00
Stefano Brivio
7400b06396 nft_set_pipapo: Introduce AVX2-based lookup implementation
If the AVX2 set is available, we can exploit the repetitive
characteristic of this algorithm to provide a fast, vectorised
version by using 256-bit wide AVX2 operations for bucket loads and
bitwise intersections.

In most cases, this implementation consistently outperforms rbtree
set instances despite the fact they are configured to use a given,
single, ranged data type out of the ones used for performance
measurements by the nft_concat_range.sh kselftest.

That script, injecting packets directly on the ingoing device path
with pktgen, reports, averaged over five runs on a single AMD Epyc
7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below.
CONFIG_RETPOLINE was not set here.

Note that this is not a fair comparison over hash and rbtree set
types: non-ranged entries (used to have a reference for hash types)
would be matched faster than this, and matching on a single field
only (which is the case for rbtree) is also significantly faster.

However, it's not possible at the moment to choose this set type
for non-ranged entries, and the current implementation also needs
a few minor adjustments in order to match on less than two fields.

 ---------------.-----------------------------------.------------.
 AMD Epyc 7402  |          baselines, Mpps          | this patch |
  1 thread      |___________________________________|____________|
  3.35GHz       |        |        |        |        |            |
  768KiB L1D$   | netdev |  hash  | rbtree |        |            |
 ---------------|  hook  |   no   | single |        |   pipapo   |
 type   entries |  drop  | ranges | field  | pipapo |    AVX2    |
 ---------------|--------|--------|--------|--------|------------|
 net,port       |        |        |        |        |            |
          1000  |   19.0 |   10.4 |    3.8 |    4.0 | 7.5   +87% |
 ---------------|--------|--------|--------|--------|------------|
 port,net       |        |        |        |        |            |
           100  |   18.8 |   10.3 |    5.8 |    6.3 | 8.1   +29% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port      |        |        |        |        |            |
          1000  |   16.4 |    7.6 |    1.8 |    2.1 | 4.8  +128% |
 ---------------|--------|--------|--------|--------|------------|
 port,proto     |        |        |        |        |            |
         30000  |   19.6 |   11.6 |    3.9 |    0.5 | 2.6  +420% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port,mac  |        |        |        |        |            |
            10  |   16.5 |    5.4 |    4.3 |    3.4 | 4.7   +38% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port,mac, |        |        |        |        |            |
 proto    1000  |   16.5 |    5.7 |    1.9 |    1.4 | 3.6   +26% |
 ---------------|--------|--------|--------|--------|------------|
 net,mac        |        |        |        |        |            |
          1000  |   19.0 |    8.4 |    3.9 |    2.5 | 6.4  +156% |
 ---------------'--------'--------'--------'--------'------------'

A similar strategy could be easily reused to implement specialised
versions for other SIMD sets, and I plan to post at least a NEON
version at a later time.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:45 +01:00
Stefano Brivio
8683f4b995 nft_set_pipapo: Prepare for vectorised implementation: helpers
Move most macros and helpers to a header file, so that they can be
conveniently used by related implementations.

No functional changes are intended here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:44 +01:00
Stefano Brivio
bf3e583923 nft_set_pipapo: Prepare for vectorised implementation: alignment
SIMD vector extension sets require stricter alignment than native
instruction sets to operate efficiently (AVX, NEON) or for some
instructions to work at all (AltiVec).

Provide facilities to define arbitrary alignment for lookup tables
and scratch maps. By defining byte alignment with NFT_PIPAPO_ALIGN,
lt_aligned and scratch_aligned pointers become available.

Additional headroom is allocated, and pointers to the possibly
unaligned, originally allocated areas are kept so that they can
be freed.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:43 +01:00
Stefano Brivio
4051f43116 nft_set_pipapo: Add support for 8-bit lookup groups and dynamic switch
While grouping matching bits in groups of four saves memory compared
to the more natural choice of 8-bit words (lookup table size is one
eighth), it comes at a performance cost, as the number of lookup
comparisons is doubled, and those also needs bitshifts and masking.

Introduce support for 8-bit lookup groups, together with a mapping
mechanism to dynamically switch, based on defined per-table size
thresholds and hysteresis, between 8-bit and 4-bit groups, as tables
grow and shrink. Empty sets start with 8-bit groups, and per-field
tables are converted to 4-bit groups if they get too big.

An alternative approach would have been to swap per-set lookup
operation functions as needed, but this doesn't allow for different
group sizes in the same set, which looks desirable if some fields
need significantly more matching data compared to others due to
heavier impact of ranges (e.g. a big number of subnets with
relatively simple port specifications).

Allowing different group sizes for the same lookup functions implies
the need for further conditional clauses, whose cost, however,
appears to be negligible in tests.

The matching rate figures below were obtained for x86_64 running
the nft_concat_range.sh "performance" cases, averaged over five
runs, on a single thread of an AMD Epyc 7402 CPU, and for aarch64
on a single thread of a BCM2711 (Raspberry Pi 4 Model B 4GB),
clocked at a stable 2147MHz frequency:

---------------.-----------------------------------.------------.
AMD Epyc 7402  |          baselines, Mpps          | this patch |
 1 thread      |___________________________________|____________|
 3.35GHz       |        |        |        |        |            |
 768KiB L1D$   | netdev |  hash  | rbtree |        |            |
---------------|  hook  |   no   | single | pipapo |   pipapo   |
type   entries |  drop  | ranges | field  | 4 bits | bit switch |
---------------|--------|--------|--------|--------|------------|
net,port       |        |        |        |        |            |
         1000  |   19.0 |   10.4 |    3.8 |    2.8 | 4.0   +43% |
---------------|--------|--------|--------|--------|------------|
port,net       |        |        |        |        |            |
          100  |   18.8 |   10.3 |    5.8 |    5.5 | 6.3   +14% |
---------------|--------|--------|--------|--------|------------|
net6,port      |        |        |        |        |            |
         1000  |   16.4 |    7.6 |    1.8 |    1.3 | 2.1   +61% |
---------------|--------|--------|--------|--------|------------|
port,proto     |        |        |        |        |     [1]    |
        30000  |   19.6 |   11.6 |    3.9 |    0.3 | 0.5   +66% |
---------------|--------|--------|--------|--------|------------|
net6,port,mac  |        |        |        |        |            |
           10  |   16.5 |    5.4 |    4.3 |    2.6 | 3.4   +31% |
---------------|--------|--------|--------|--------|------------|
net6,port,mac, |        |        |        |        |            |
proto    1000  |   16.5 |    5.7 |    1.9 |    1.0 | 1.4   +40% |
---------------|--------|--------|--------|--------|------------|
net,mac        |        |        |        |        |            |
         1000  |   19.0 |    8.4 |    3.9 |    1.7 | 2.5   +47% |
---------------'--------'--------'--------'--------'------------'
[1] Causes switch of lookup table buckets for 'port', not 'proto',
    to 4-bit groups

 ---------------.-----------------------------------.------------.
 BCM2711        |          baselines, Mpps          | this patch |
  1 thread      |___________________________________|____________|
  2147MHz       |        |        |        |        |            |
  32KiB L1D$    | netdev |  hash  | rbtree |        |            |
 ---------------|  hook  |   no   | single | pipapo |   pipapo   |
 type   entries |  drop  | ranges | field  | 4 bits | bit switch |
 ---------------|--------|--------|--------|--------|------------|
 net,port       |        |        |        |        |            |
          1000  |   1.63 |   1.37 |   0.87 |   0.61 | 0.70  +17% |
 ---------------|--------|--------|--------|--------|------------|
 port,net       |        |        |        |        |            |
           100  |   1.64 |   1.36 |   1.02 |   0.78 | 0.81   +4% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port      |        |        |        |        |            |
          1000  |   1.56 |   1.27 |   0.65 |   0.34 | 0.50  +47% |
 ---------------|--------|--------|--------|--------|------------|
 port,proto [2] |        |        |        |        |            |
         10000  |   1.68 |   1.43 |   0.84 |   0.30 | 0.40  +13% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port,mac  |        |        |        |        |            |
            10  |   1.56 |   1.14 |   1.02 |   0.62 | 0.66   +6% |
 ---------------|--------|--------|--------|--------|------------|
 net6,port,mac, |        |        |        |        |            |
 proto    1000  |   1.56 |   1.12 |   0.64 |   0.27 | 0.40  +48% |
 ---------------|--------|--------|--------|--------|------------|
 net,mac        |        |        |        |        |            |
          1000  |   1.63 |   1.26 |   0.87 |   0.41 | 0.53  +29% |
 ---------------'--------'--------'--------'--------'------------'
[2] Using 10000 entries instead of 30000 as it would take way too
    long for the test script to generate all of them

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:43 +01:00
Stefano Brivio
e807b13cb3 nft_set_pipapo: Generalise group size for buckets
Get rid of all hardcoded assumptions that buckets in lookup tables
correspond to four-bit groups, and replace them with appropriate
calculations based on a variable group size, now stored in struct
field.

The group size could now be in principle any divisor of eight. Note,
though, that lookup and get functions need an implementation
intimately depending on the group size, and the only supported size
there, currently, is four bits, which is also the initial and only
used size at the moment.

While at it, drop 'groups' from struct nft_pipapo: it was never used.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:42 +01:00
wenxu
88bf6e4114 netfilter: flowtable: add tunnel encap/decap action offload support
This patch add tunnel encap decap action offload in the flowtable
offload.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:27:23 +01:00
wenxu
cfab6dbd0e netfilter: flowtable: add tunnel match offload support
This patch support both ipv4 and ipv6 tunnel_id, tunnel_src and
tunnel_dst match for flowtable offload

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:26:17 +01:00
wenxu
b5140a36da netfilter: flowtable: add indr block setup support
Add etfilter flowtable support indr-block setup. It makes flowtable offload
vlan and tunnel device.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:22:50 +01:00
wenxu
4679877921 netfilter: flowtable: add nf_flow_table_block_offload_init()
Add nf_flow_table_block_offload_init prepare for the indr block
offload patch

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:22:32 +01:00
Dan Carpenter
f628c27d85 netfilter: xt_IDLETIMER: clean up some indenting
These lines were indented wrong so Smatch complained.
net/netfilter/xt_IDLETIMER.c:81 idletimer_tg_show() warn: inconsistent indenting

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:17 +01:00
Jeremy Sowden
049dee95f8 netfilter: bitwise: use more descriptive variable-names.
Name the mask and xor data variables, "mask" and "xor," instead of "d1"
and "d2."

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Gustavo A. R. Silva
6daf141401 netfilter: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Chen Wandun
eb9d7af3b7 netfilter: nft_set_pipapo: make the symbol 'nft_pipapo_get' static
Fix the following sparse warning:

net/netfilter/nft_set_pipapo.c:739:6: warning: symbol 'nft_pipapo_get' was not declared. Should it be static?

Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Li RongQing
9325f070f7 netfilter: cleanup unused macro
TEMPLATE_NULLS_VAL is not used after commit 0838aa7fcf
("netfilter: fix netns dependencies with conntrack templates")

PFX is not used after commit 8bee4bad03 ("netfilter: xt
extensions: use pr_<level>")

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Florian Westphal
24d19826fc netfilter: nf_tables: make all set structs const
They do not need to be writeable anymore.

v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Florian Westphal
e32a4dc651 netfilter: nf_tables: make sets built-in
Placing nftables set support in an extra module is pointless:

1. nf_tables needs dynamic registeration interface for sake of one module
2. nft heavily relies on sets, e.g. even simple rule like
   "nft ... tcp dport { 80, 443 }" will not work with _SETS=n.

IOW, either nftables isn't used or both nf_tables and nf_tables_set
modules are needed anyway.

With extra module:
 307K net/netfilter/nf_tables.ko
  79K net/netfilter/nf_tables_set.ko

   text  data  bss     dec filename
 146416  3072  545  150033 nf_tables.ko
  35496  1817    0   37313 nf_tables_set.ko

This patch:
 373K net/netfilter/nf_tables.ko

 178563  4049  545  183157 nf_tables.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Xin Long
925d844696 netfilter: nft_tunnel: add support for geneve opts
Like vxlan and erspan opts, geneve opts should also be supported in
nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14)
allows a geneve packet to carry multiple geneve opts. So with this
patch, nftables/libnftnl would do:

  # nft add table ip filter
  # nft add chain ip filter input { type filter hook input priority 0 \; }
  # nft add tunnel filter geneve_02 { type geneve\; id 2\; \
    ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \
    sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \
    opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; }
  # nft list tunnels table filter
    table ip filter {
    	tunnel geneve_02 {
    		id 2
    		ip saddr 192.168.1.1
    		ip daddr 192.168.1.2
    		sport 9000
    		dport 9001
    		tos 18
    		ttl 64
    		flags 1
    		geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890
    	}
    }

v1->v2:
  - no changes, just post it separately.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Manoj Basapathi
68983a354a netfilter: xtables: Add snapshot of hardidletimer target
This is a snapshot of hardidletimer netfilter target.

This patch implements a hardidletimer Xtables target that can be
used to identify when interfaces have been idle for a certain period
of time.

Timers are identified by labels and are created when a rule is set
with a new label. The rules also take a timeout value (in seconds) as
an option. If more than one rule uses the same timer label, the timer
will be restarted whenever any of the rules get a hit.

One entry for each timer is created in sysfs. This attribute contains
the timer remaining for the timer to expire. The attributes are
located under the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification
to the userspace, which can then decide what to do (eg. disconnect to
save power)

Compared to IDLETIMER, HARDIDLETIMER can send notifications when
CPU is in suspend too, to notify the timer expiry.

v1->v2: Moved all functionality into IDLETIMER module to avoid
code duplication per comment from Florian.

Signed-off-by: Manoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:16 +01:00
Paul Blakey
c3c831b0a2 netfilter: flowtable: Use nf_flow_offload_tuple for stats as well
This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15 15:20:15 +01:00
Alexander Bersenev
5d0ab06b63 cdc_ncm: Fix the build warning
The ndp32->wLength is two bytes long, so replace cpu_to_le32 with cpu_to_le16.

Fixes: 0fa81b304a ("cdc_ncm: Implement the 32-bit version of NCM Transfer Block")
Signed-off-by: Alexander Bersenev <bay@hackerdom.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 00:41:29 -07:00
David S. Miller
a79c838fb0 Merge branch 'mptcp-simplify-mptcp_accept'
Paolo Abeni says:

====================
mptcp: simplify mptcp_accept()

Currently we allocate the MPTCP master socket at accept time.

The above makes mptcp_accept() quite complex, and requires checks is several
places for NULL MPTCP master socket.

These series simplify the MPTCP accept implementation, moving the master socket
allocation at syn-ack time, so that we drop unneeded checks with the follow-up
patch.

v1 -> v2:
- rebased on top of 2398e3991b
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 00:19:03 -07:00
Paolo Abeni
dc093db5cc mptcp: drop unneeded checks
After the previous patch subflow->conn is always != NULL and
is never changed. We can drop a bunch of now unneeded checks.

v1 -> v2:
 - rebased on top of commit 2398e3991b ("mptcp: always
   include dack if possible.")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 00:19:03 -07:00
Paolo Abeni
58b0991962 mptcp: create msk early
This change moves the mptcp socket allocation from mptcp_accept() to
subflow_syn_recv_sock(), so that subflow->conn is now always set
for the non fallback scenario.

It allows cleaning up a bit mptcp_accept() reducing the additional
locking and will allow fourther cleanup in the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 00:19:03 -07:00
Dejin Zheng
7a1d0e61f1 net: stmmac: platform: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code, which
contains platform_get_resource and devm_ioremap_resource.

Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 00:17:41 -07:00