Commit graph

81373 commits

Author SHA1 Message Date
Russell King (Oracle)
0d22d4b626 net: phylink: add pcs_validate() method
Add a hook for PCS to validate the link parameters. This avoids MAC
drivers having to have knowledge of their PCS in their validate()
method, thereby allowing several MAC drivers to be simplfied.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:37:13 +00:00
Russell King (Oracle)
d1e86325af net: phylink: add mac_select_pcs() method to phylink_mac_ops
mac_select_pcs() allows us to have an explicit point to query which
PCS the MAC wishes to use for a particular PHY interface mode, thereby
allowing us to add support to validate the link settings with the PCS.

Phylink will also use this to select the PCS to be used during a major
configuration event without the MAC driver needing to call
phylink_set_pcs().

Note that if mac_select_pcs() is present, the supported_interfaces
bitmap must be filled in; this avoids mac_select_pcs() being called
with PHY_INTERFACE_MODE_NA when we want to get support for all
interface types. Phylink will return an error in phylink_create()
unless this condition is satisfied.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:37:13 +00:00
David S. Miller
823f7a5497 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next branch 2021-12-15

Hi Dave, Jakub, Jason

This pulls mlx5-next branch into net-next and rdma branches.
All patches already reviewed on both rdma and netdev mailing lists.

Please pull and let me know if there's any problem.

1) Add multiple FDB steering priorities [1]
2) Introduce HW bits needed to configure MAC list size of VF/SF.
   Required for ("net/mlx5: Memory optimizations") upcoming series [2].

[1] https://lore.kernel.org/netdev/20211201193621.9129-1-saeed@kernel.org/
[2] https://lore.kernel.org/lkml/20211208141722.13646-1-shayd@nvidia.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16 10:22:20 +00:00
Shay Drory
685b1afd79 net/mlx5: Introduce log_max_current_uc_list_wr_supported bit
Downstream patch will use this bit in order to know whether the device
supports changing of max_uc_list.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-15 10:21:50 -08:00
Eric Dumazet
f1d9268e06 net: add net device refcount tracker to struct packet_type
Most notable changes are in af_packet, tipc ones are trivial.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jmaloy@redhat.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15 15:07:04 +00:00
Eric Dumazet
9280ac2e6f net: dev_replace_track() cleanup
Use existing helpers (netdev_tracker_free()
and netdev_tracker_alloc()) to remove ifdefery.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20211214151515.312535-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14 18:46:05 -08:00
Vladimir Oltean
c8a2a011cd net: dsa: sja1105: fix broken connection with the sja1110 tagger
The driver was incorrectly converted assuming that "sja1105" is the only
tagger supported by this driver. This results in SJA1110 switches
failing to probe:

sja1105 spi1.0: Unable to connect to tag protocol "sja1110": -EPROTONOSUPPORT
sja1105: probe of spi1.2 failed with error -93

Add DSA_TAG_PROTO_SJA1110 to the list of supported taggers by the
sja1105 driver. The sja1105_tagger_data structure format is common for
the two tagging protocols.

Fixes: c79e84866d ("net: dsa: tag_sja1105: convert to tagger-owned data")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14 12:45:16 +00:00
Maor Gottlieb
22c3f2f56b net/mlx5: Separate FDB namespace
This patch doesn't add an additional namespaces, but just separates the
naming to be used by each FDB user, bypass and kernel.
Downstream patches will actually split this up and allow to have more
than single priority for the bypass users.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-13 16:02:59 -08:00
Sebastian Andrzej Siewior
3c118547f8 u64_stats: Disable preemption on 32bit UP+SMP PREEMPT_RT during updates.
On PREEMPT_RT the seqcount_t for synchronisation is required on 32bit
architectures even on UP because the softirq (and the threaded IRQ handler) can
be preempted.

With the seqcount_t for synchronisation, a reader with higher priority can
preempt the writer and then spin endlessly in read_seqcount_begin() while the
writer can't make progress.

To avoid such a lock up on PREEMPT_RT the writer must disable preemption during
the update. There is no need to disable interrupts because no writer is using
this API in hard-IRQ context on PREEMPT_RT.

Disable preemption on 32bit-RT within the u64_stats write section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13 12:42:08 +00:00
Vladimir Oltean
950a419d9d net: dsa: tag_sja1105: split sja1105_tagger_data into private and public sections
The sja1105 driver messes with the tagging protocol's state when PTP RX
timestamping is enabled/disabled. This is fundamentally necessary
because the tagger needs to know what to do when it receives a PTP
packet. If RX timestamping is enabled, then a metadata follow-up frame
is expected, and this holds the (partial) timestamp. So the tagger plays
hide-and-seek with the network stack until it also gets the metadata
frame, and then presents a single packet, the timestamped PTP packet.
But when RX timestamping isn't enabled, there is no metadata frame
expected, so the hide-and-seek game must be turned off and the packet
must be delivered right away to the network stack.

Considering this, we create a pseudo isolation by devising two tagger
methods callable by the switch: one to get the RX timestamping state,
and one to set it. Since we can't export symbols between the tagger and
the switch driver, these methods are exposed through function pointers.

After this change, the public portion of the sja1105_tagger_data
contains only function pointers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:34 +00:00
Vladimir Oltean
fcbf979a5b Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver"
This reverts commit 6d709cadfd.

The above change was done to avoid calling symbols exported by the
switch driver from the tagging protocol driver.

With the tagger-owned storage model, we have a new option on our hands,
and that is for the switch driver to provide a data consumer handler in
the form of a function pointer inside the ->connect_tag_protocol()
method. Having a function pointer avoids the problems of the exported
symbols approach.

By creating a handler for metadata frames holding TX timestamps on
SJA1110, we are able to eliminate an skb queue from the tagger data, and
replace it with a simple, and stateless, function pointer. This skb
queue is now handled exclusively by sja1105_ptp.c, which makes the code
easier to follow, as it used to be before the reverted patch.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:34 +00:00
Vladimir Oltean
c79e84866d net: dsa: tag_sja1105: convert to tagger-owned data
Currently, struct sja1105_tagger_data is a part of struct
sja1105_private, and is used by the sja1105 driver to populate dp->priv.

With the movement towards tagger-owned storage, the sja1105 driver
should not be the owner of this memory.

This change implements the connection between the sja1105 switch driver
and its tagging protocol, which means that sja1105_tagger_data no longer
stays in dp->priv but in ds->tagger_data, and that the sja1105 driver
now only populates the sja1105_port_deferred_xmit callback pointer.
The kthread worker is now the responsibility of the tagger.

The sja1105 driver also alters the tagger's state some more, especially
with regard to the PTP RX timestamping state. This will be fixed up a
bit in further changes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Vladimir Oltean
22ee9f8e40 net: dsa: sja1105: move ts_id from sja1105_tagger_data
The TX timestamp ID is incremented by the SJA1110 PTP timestamping
callback (->port_tx_timestamp) for every packet, when cloning it.
It isn't used by the tagger at all, even though it sits inside the
struct sja1105_tagger_data.

Also, serialization to this structure is currently done through
tagger_data->meta_lock, which is a cheap hack because the meta_lock
isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine
isn't called).

This change moves ts_id from sja1105_tagger_data to sja1105_private and
introduces a dedicated spinlock for it, also in sja1105_private.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Vladimir Oltean
bfcf142522 net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data
The design of the sja1105 tagger dp->priv is that each port has a
separate struct sja1105_port, and the sp->data pointer points to a
common struct sja1105_tagger_data.

We have removed all per-port members accessible by the tagger, and now
only struct sja1105_tagger_data remains. Make dp->priv point directly to
this.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Vladimir Oltean
6f6770ab1c net: dsa: sja1105: remove hwts_tx_en from tagger data
This tagger property is in fact not used at all by the tagger, only by
the switch driver. Therefore it makes sense to be moved to
sja1105_private.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Vladimir Oltean
d38049bbe7 net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q
When the ocelot-8021q driver was converted to deferred xmit as part of
commit 8d5f7954b7 ("net: dsa: felix: break at first CPU port during
init and teardown"), the deferred implementation was deliberately made
subtly different from what sja1105 has.

The implementation differences lied on the following observations:

- There might be a race between these two lines in tag_sja1105.c:

       skb_queue_tail(&sp->xmit_queue, skb_get(skb));
       kthread_queue_work(sp->xmit_worker, &sp->xmit_work);

  and the skb dequeue logic in sja1105_port_deferred_xmit(). For
  example, the xmit_work might be already queued, however the work item
  has just finished walking through the skb queue. Because we don't
  check the return code from kthread_queue_work, we don't do anything if
  the work item is already queued.

  However, nobody will take that skb and send it, at least until the
  next timestampable skb is sent. This creates additional (and
  avoidable) TX timestamping latency.

  To close that race, what the ocelot-8021q driver does is it doesn't
  keep a single work item per port, and a skb timestamping queue, but
  rather dynamically allocates a work item per packet.

- It is also unnecessary to have more than one kthread that does the
  work. So delete the per-port kthread allocations and replace them with
  a single kthread which is global to the switch.

This change brings the two implementations in line by applying those
observations to the sja1105 driver as well.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Vladimir Oltean
35d9768021 net: dsa: tag_ocelot: convert to tagger-owned data
The felix driver makes very light use of dp->priv, and the tagger is
effectively stateless. dp->priv is practically only needed to set up a
callback to perform deferred xmit of PTP and STP packets using the
ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no
use of dp->priv, although this driver sets up dp->priv irrespective of
actual tagging protocol in use).

struct felix_port (what used to be pointed to by dp->priv) is removed
and replaced with a two-sided structure. The public side of this
structure, visible to the switch driver, is ocelot_8021q_tagger_data.
The private side is ocelot_8021q_tagger_private, and the latter
structure physically encapsulates the former. The public half of the
tagger data structure can be accessed through a helper of the same name
(ocelot_8021q_tagger_data) which also sanity-checks the protocol
currently in use by the switch. The public/private split was requested
by Andrew Lunn.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12 12:51:33 +00:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Eric Dumazet
04a931e58d net: add netns refcount tracker to struct seq_net_private
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 06:38:26 -08:00
Eric Dumazet
9ba74e6c9e net: add networking namespace refcount tracker
We have 100+ syzbot reports about netns being dismantled too soon,
still unresolved as of today.

We think a missing get_net() or an extra put_net() is the root cause.

In order to find the bug(s), and be able to spot future ones,
this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers
to precisely pair all put_net() with corresponding get_net().

To use these helpers, each data structure owning a refcount
should also use a "netns_tracker" to pair the get and put.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 06:38:26 -08:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
Kees Cook
1a2fb220ed skbuff: Extract list pointers to silence compiler warnings
Under both -Warray-bounds and the object_size sanitizer, the compiler is
upset about accessing prev/next of sk_buff when the object it thinks it
is coming from is sk_buff_head. The warning is a false positive due to
the compiler taking a conservative approach, opting to warn at casting
time rather than access time.

However, in support of enabling -Warray-bounds globally (which has
found many real bugs), arrange things for sk_buff so that the compiler
can unambiguously see that there is no intention to access anything
except prev/next.  Introduce and cast to a separate struct sk_buff_list,
which contains _only_ the first two fields, silencing the warnings:

In file included from ./include/net/net_namespace.h:39,
                 from ./include/linux/netdevice.h:37,
                 from net/core/netpoll.c:17:
net/core/netpoll.c: In function 'refill_skbs':
./include/linux/skbuff.h:2086:9: warning: array subscript 'struct sk_buff[0]' is partly outside array bounds of 'struct sk_buff_head[1]' [-Warray-bounds]
 2086 |         __skb_insert(newsk, next->prev, next, list);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/core/netpoll.c:49:28: note: while referencing 'skb_pool'
   49 | static struct sk_buff_head skb_pool;
      |                            ^~~~~~~~

This change results in no executable instruction differences.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211207062758.2324338-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 11:54:24 -08:00
Linus Torvalds
ded746bfc9 Networking fixes for 5.16-rc5, including fixes from bpf, can and netfilter.
Current release - regressions:
 
  - bpf, sockmap: re-evaluate proto ops when psock is removed from sockmap
 
 Current release - new code bugs:
 
  - bpf: fix bpf_check_mod_kfunc_call for built-in modules
 
  - ice: fixes for TC classifier offloads
 
  - vrf: don't run conntrack on vrf with !dflt qdisc
 
 Previous releases - regressions:
 
  - bpf: fix the off-by-two error in range markings
 
  - seg6: fix the iif in the IPv6 socket control block
 
  - devlink: fix netns refcount leak in devlink_nl_cmd_reload()
 
  - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
 
  - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
 
 Previous releases - always broken:
 
  - ethtool: do not perform operations on net devices being unregistered
 
  - udp: use datalen to cap max gso segments
 
  - ice: fix races in stats collection
 
  - fec: only clear interrupt of handling queue in fec_enet_rx_queue()
 
  - m_can: pci: fix incorrect reference clock rate
 
  - m_can: disable and ignore ELO interrupt
 
  - mvpp2: fix XDP rx queues registering
 
 Misc:
 
  - treewide: add missing includes masked by cgroup -> bpf.h dependency
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmGyN1AACgkQMUZtbf5S
 IrtgMA/8D0qk3c75ts0hCzGXwdNdEBs+e7u1bJVPqdyU8x/ZLAp2c0EKB/7IWuxA
 CtsnbanPcmibqvQJDI1hZEBdafi43BmF5VuFSIxYC4EM/1vLoRprurXlIwL2YWki
 aWi//tyOIGBl6/ClzJ9Vm51HTJQwDmdv8GRnKAbsC1eOTM3pmmcg+6TLbDhycFEQ
 F9kkDCvyB9kWIH645QyJRH+Y5qQOvneCyQCPkkyjTgEADzV5i7YgtRol6J3QIbw3
 umPHSckCBTjMacYcCLsbhQaF2gTMgPV1basNLPMjCquJVrItE0ZaeX3MiD6nBFae
 yY5+Wt5KAZDzjERhneX8AINHoRPA/tNIahC1+ytTmsTA8Hj230FHE5hH1ajWiJ9+
 GSTBCBqjtZXce3r2Efxfzy0Kb9JwL3vDi7LS2eKQLv0zBLfYp2ry9Sp9qe4NhPkb
 OYrxws9kl5GOPvrFB5BWI9XBINciC9yC3PjIsz1noi0vD8/Hi9dPwXeAYh36fXU3
 rwRg9uAt6tvFCpwbuQ9T2rsMST0miur2cDYd8qkJtuJ7zFvc+suMXwBZyI29nF2D
 uyymIC2XStHJfAjUkFsGVUSXF5FhML9OQsqmisdQ8KdH26jMnDeMjIWJM7UWK+zY
 E/fqWT8UyS3mXWqaggid4ZbotipCwA0gxiDHuqqUGTM+dbKrzmk=
 =F6rS
 -----END PGP SIGNATURE-----

Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - bpf, sockmap: re-evaluate proto ops when psock is removed from
     sockmap

  Current release - new code bugs:

   - bpf: fix bpf_check_mod_kfunc_call for built-in modules

   - ice: fixes for TC classifier offloads

   - vrf: don't run conntrack on vrf with !dflt qdisc

  Previous releases - regressions:

   - bpf: fix the off-by-two error in range markings

   - seg6: fix the iif in the IPv6 socket control block

   - devlink: fix netns refcount leak in devlink_nl_cmd_reload()

   - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"

   - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports

  Previous releases - always broken:

   - ethtool: do not perform operations on net devices being
     unregistered

   - udp: use datalen to cap max gso segments

   - ice: fix races in stats collection

   - fec: only clear interrupt of handling queue in fec_enet_rx_queue()

   - m_can: pci: fix incorrect reference clock rate

   - m_can: disable and ignore ELO interrupt

   - mvpp2: fix XDP rx queues registering

  Misc:

   - treewide: add missing includes masked by cgroup -> bpf.h
     dependency"

* tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
  net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
  net: wwan: iosm: fixes unable to send AT command during mbim tx
  net: wwan: iosm: fixes net interface nonfunctional after fw flash
  net: wwan: iosm: fixes unnecessary doorbell send
  net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
  MAINTAINERS: s390/net: remove myself as maintainer
  net/sched: fq_pie: prevent dismantle issue
  net: mana: Fix memory leak in mana_hwc_create_wq
  seg6: fix the iif in the IPv6 socket control block
  nfp: Fix memory leak in nfp_cpp_area_cache_add()
  nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
  nfc: fix segfault in nfc_genl_dump_devices_done
  udp: using datalen to cap max gso segments
  net: dsa: mv88e6xxx: error handling for serdes_power functions
  can: kvaser_usb: get CAN clock frequency from device
  can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
  net: mvpp2: fix XDP rx queues registering
  vmxnet3: fix minimum vectors alloc issue
  net, neigh: clear whole pneigh_entry at alloc time
  net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
  ...
2021-12-09 11:26:44 -08:00
Russell King (Oracle)
001f4261fe net: phylink: use legacy_pre_march2020
Use the legacy flag to indicate whether we should operate in legacy
mode. This allows us to stop using the presence of a PCS as an
indicator to the age of the phylink user, and make PCS presence
optional.

Legacy mode involves:
1) calling mac_config() whenever the link comes up
2) calling mac_config() whenever the inband advertisement changes,
   possibly followed by a call to mac_an_restart()
3) making use of mac_an_restart()
4) making use of mac_pcs_get_state()

All the above functionality was moved to a seperate "PCS" block of
operations in March 2020.

Update the documents to indicate that the differences that this flag
makes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 11:21:03 -08:00
Russell King (Oracle)
3e5b1fecce net: phylink: add legacy_pre_march2020 indicator
Add a boolean to phylink_config to indicate whether a driver has not
been updated for the changes in commit 7cceb599d1 ("net: phylink:
avoid mac_config calls"), and thus are reliant on the old behaviour.

We were currently keying the phylink behaviour on the presence of a
PCS, but this is sub-optimal for modern drivers that may not have a
PCS.

This commit merely introduces the new flag, but does not add any use,
since we need all legacy drivers to set this flag before it can be
used. Once these legacy drivers have been updated, we can remove this
flag.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 11:21:03 -08:00
Linus Torvalds
03090cc76e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - fixes for various drivers which assume that a HID device is on USB
   transport, but that might not necessarily be the case, as the device
   can be faked by uhid. (Greg, Benjamin Tissoires)

 - fix for spurious wakeups on certain Lenovo notebooks (Thomas
   Weißschuh)

 - a few other device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: Ignore battery for Elan touchscreen on Asus UX550VE
  HID: intel-ish-hid: ipc: only enable IRQ wakeup when requested
  HID: google: add eel USB id
  HID: add USB_HID dependancy to hid-prodikeys
  HID: add USB_HID dependancy to hid-chicony
  HID: bigbenff: prevent null pointer dereference
  HID: sony: fix error path in probe
  HID: add USB_HID dependancy on some USB HID drivers
  HID: check for valid USB device for many HID drivers
  HID: wacom: fix problems when device is not a valid USB device
  HID: add hid_is_usb() function to make it simpler for USB detection
  HID: quirks: Add quirk for the Microsoft Surface 3 type-cover
2021-12-09 11:08:19 -08:00
Sergey Ryazanov
283e6f5a81 net: wwan: make debugfs optional
Debugfs interface is optional for the regular modem use. Some distros
and users will want to disable this feature for security or kernel
size reasons. So add a configuration option that allows to completely
disable the debugfs interface of the WWAN devices.

A primary considered use case for this option was embedded firmwares.
For example, in OpenWrt, you can not completely disable debugfs, as a
lot of wireless stuff can only be configured and monitored with the
debugfs knobs. At the same time, reducing the size of a kernel and
modules is an essential task in the world of embedded software.
Disabling the WWAN and IOSM debugfs interfaces allows us to save 50K
(x86-64 build) of space for module storage. Not much, but already
considerable when you only have 16MB of storage.

So it is hard to just disable whole debugfs. Users need some fine
grained set of options to control which debugfs interface is important
and should be available and which is not.

The new configuration symbol is enabled by default and is hidden under
the EXPERT option. So a regular user would not be bothered by another
one configuration question. While an embedded distro maintainer will be
able to a little more reduce the final image size.

Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 17:58:59 -08:00
Jakub Kicinski
a43a072021 linux-can-next-for-5.17-20211208
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmGwqbcTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqSGtB/sF8wFDyY50dRG8WQLaeeFIFIZetHni
 NY6+9ch5xWM6aFDP3ecmGXqZtNaHgeAJVChmyALKs4Sa+5L2F66bGwHHuKu7uG0y
 42tz4XSI8B3mZgtVTJ7lKyeu4BismzhJm+iG6I/x3CXo482AxEB2UP53nYssv1vA
 8jfjH9gx95sQDXs0WLY2QDvkWS55BgZzpqqcT0K+uqeT+5Ufa2z3oLuCdQbSejNR
 3v3kizMp6mBL55z/UbEiR2G7bqZbOtzmAMdhd7g69fKnrav67qnXtyUQpGcOSMWB
 Y6/xUuwq8JoYVMQfpnzJ1KebW+uwpIyIGL/e8GOLlkVioWWfMqLz5bv0
 =7p/y
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-5.17-20211208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
can-next 2021-12-08

The first patch is by Vincent Mailhol and replaces the custom CAN
units with generic one form linux/units.h.

The next 3 patches are by Evgeny Boger and add Allwinner R40 support
to the sun4i CAN driver.

Andy Shevchenko contributes 4 patches to the hi311x CAN driver,
consisting of cleanups and converting the driver to the device
property API.

* tag 'linux-can-next-for-5.17-20211208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: hi311x: hi3110_can_probe(): convert to use dev_err_probe()
  can: hi311x: hi3110_can_probe(): make use of device property API
  can: hi311x: hi3110_can_probe(): try to get crystal clock rate from property
  can: hi311x: hi3110_can_probe(): use devm_clk_get_optional() to get the input clock
  ARM: dts: sun8i: r40: add node for CAN controller
  can: sun4i_can: add support for R40 CAN controller
  dt-bindings: net: can: add support for Allwinner R40 CAN controller
  can: bittiming: replace CAN units with the generic ones from linux/units.h
====================

Link: https://lore.kernel.org/r/20211208125055.223141-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 17:06:58 -08:00
Jakub Kicinski
6efcdadc15 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2021-12-08

We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).

The main changes are:

1) Fix an off-by-two error in packet range markings and also add a batch of
   new tests for coverage of these corner cases, from Maxim Mikityanskiy.

2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.

3) Fix two functional regressions and a build warning related to BTF kfunc
   for modules, from Kumar Kartikeya Dwivedi.

4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
   PREEMPT_RT kernels, from Sebastian Andrzej Siewior.

5) Add missing includes in order to be able to detangle cgroup vs bpf header
   dependencies, from Jakub Kicinski.

6) Fix regression in BPF sockmap tests caused by missing detachment of progs
   from sockets when they are removed from the map, from John Fastabend.

7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
   dispatcher, from Björn Töpel.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add selftests to cover packet access corner cases
  bpf: Fix the off-by-two error in range markings
  treewide: Add missing includes masked by cgroup -> bpf dependency
  tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
  bpf: Fix bpf_check_mod_kfunc_call for built-in modules
  bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
  mips, bpf: Fix reference to non-existing Kconfig symbol
  bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
  Documentation/locking/locktypes: Update migrate_disable() bits.
  bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
  bpf, sockmap: Attach map progs to psock early for feature probes
  bpf, x86: Fix "no previous prototype" warning
====================

Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 16:06:44 -08:00
Vladimir Oltean
d3eed0e57d net: dsa: keep the bridge_dev and bridge_num as part of the same structure
The main desire behind this is to provide coherent bridge information to
the fast path without locking.

For example, right now we set dp->bridge_dev and dp->bridge_num from
separate code paths, it is theoretically possible for a packet
transmission to read these two port properties consecutively and find a
bridge number which does not correspond with the bridge device.

Another desire is to start passing more complex bridge information to
dsa_switch_ops functions. For example, with FDB isolation, it is
expected that drivers will need to be passed the bridge which requested
an FDB/MDB entry to be offloaded, and along with that bridge_dev, the
associated bridge_num should be passed too, in case the driver might
want to implement an isolation scheme based on that number.

We already pass the {bridge_dev, bridge_num} pair to the TX forwarding
offload switch API, however we'd like to remove that and squash it into
the basic bridge join/leave API. So that means we need to pass this
pair to the bridge join/leave API.

During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we
call the driver's .port_bridge_leave with what used to be our
dp->bridge_dev, but provided as an argument.

When bridge_dev and bridge_num get folded into a single structure, we
need to preserve this behavior in dsa_port_bridge_leave: we need a copy
of what used to be in dp->bridge.

Switch drivers check bridge membership by comparing dp->bridge_dev with
the provided bridge_dev, but now, if we provide the struct dsa_bridge as
a pointer, they cannot keep comparing dp->bridge to the provided
pointer, since this only points to an on-stack copy. To make this
obvious and prevent driver writers from forgetting and doing stupid
things, in this new API, the struct dsa_bridge is provided as a full
structure (not very large, contains an int and a pointer) instead of a
pointer. An explicit comparison function needs to be used to determine
bridge membership: dsa_port_offloads_bridge().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 14:31:16 -08:00
Vladimir Oltean
3f9bb0301d net: dsa: make dp->bridge_num one-based
I have seen too many bugs already due to the fact that we must encode an
invalid dp->bridge_num as a negative value, because the natural tendency
is to check that invalid value using (!dp->bridge_num). Latest example
can be seen in commit 1bec0f0506 ("net: dsa: fix bridge_num not
getting cleared after ports leaving the bridge").

Convert the existing users to assume that dp->bridge_num == 0 is the
encoding for invalid, and valid bridge numbers start from 1.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08 14:31:14 -08:00
Vincent Mailhol
330c6d3bfa can: bittiming: replace CAN units with the generic ones from linux/units.h
In [1], we introduced a set of units in linux/can/bittiming.h. Since
then, generic SI prefixes were added to linux/units.h in [2]. Those
new prefixes can perfectly replace CAN specific ones.

This patch replaces all occurrences of the CAN units with their
corresponding prefix (from linux/units) and the unit (as a comment)
according to below table.

 CAN units	SI metric prefix (from linux/units) + unit (as a comment)
 ------------------------------------------------------------------------
 CAN_KBPS	KILO /* BPS */
 CAN_MBPS	MEGA /* BPS */
 CAM_MHZ	MEGA /* Hz */

The definition are then removed from linux/can/bittiming.h

[1] commit 1d7750760b ("can: bittiming: add CAN_KBPS, CAN_MBPS and
CAN_MHZ macros")

[2] commit 26471d4a6c ("units: Add SI metric prefix definitions")

Link: https://lore.kernel.org/all/20211124014536.782550-1-mailhol.vincent@wanadoo.fr
Suggested-by: Jimmy Assarsson <extja@kvaser.com>
Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-08 10:12:58 +01:00
Yanteng Si
a97770cc40 net: phy: Remove unnecessary indentation in the comments of phy_device
Fix warning as:

linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.

Suggested-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:06 -08:00
Jakub Kicinski
150791442e wireless-drivers-next patches for v5.17
First set of patches for v5.17. The biggest change is the iwlmei
 driver for Intel's AMT devices. Also now WCN6855 support in ath11k
 should be usable.
 
 Major changes:
 
 ath10k
 
 * fetch (pre-)calibration data via nvmem subsystem
 
 ath11k
 
 * enable 802.11 power save mode in station mode for qca6390 and wcn6855
 
 * trace log support
 
 * proper board file detection for WCN6855 based on PCI ids
 
 * BSS color change support
 
 rtw88
 
 * add debugfs file to force lowest basic rate
 
 * add quirk to disable PCI ASPM on HP 250 G7 Notebook PC
 
 mwifiex
 
 * add quirk to disable deep sleep with certain hardware revision in
   Surface Book 2 devices
 
 iwlwifi
 
 * add iwlmei driver for co-operating with Intel's Active Management
   Technology (AMT) devices
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmGvclcRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtK6wgAoOoT83JKMreXlXXhVegqlJbC3HyElF5r
 DJlpDDJkJUT9airol2nd0yFfP+5WFyQPrt5shsQmqz43U4jlgfFpFXZIjQufK+gn
 VAGvVGfsanRXEFlsVgFZeSZvAEyEyNSggxADC03Ky0xtiCGc89r2o71jD3HA/ZzO
 1X8gbKH7YLWj4G/GQkrKsvIAwzoZXT7nwQSdW73M8QVzk4OSNhLBLdiqKYq0yViM
 7Ea2Vj27hiyk/RXNUZHy+bKa8vKN5sA91VHJ836aPZBQ4OLotGzF3AgHfgIhIpdr
 hI4BVJbngpjQho1EkCnZZuISPes0YQWJB5hK5xpL98yuIL4wyJRfeQ==
 =I8Dj
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.17

First set of patches for v5.17. The biggest change is the iwlmei
driver for Intel's AMT devices. Also now WCN6855 support in ath11k
should be usable.

Major changes:

ath10k
 * fetch (pre-)calibration data via nvmem subsystem

ath11k
 * enable 802.11 power save mode in station mode for qca6390 and wcn6855
 * trace log support
 * proper board file detection for WCN6855 based on PCI ids
 * BSS color change support

rtw88
 * add debugfs file to force lowest basic rate
 * add quirk to disable PCI ASPM on HP 250 G7 Notebook PC

mwifiex
 * add quirk to disable deep sleep with certain hardware revision in
  Surface Book 2 devices

iwlwifi
 * add iwlmei driver for co-operating with Intel's Active Management
   Technology (AMT) devices

* tag 'wireless-drivers-next-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (87 commits)
  iwlwifi: mei: fix linking when tracing is not enabled
  rtlwifi: rtl8192de: Style clean-ups
  mwl8k: Use named struct for memcpy() region
  intersil: Use struct_group() for memcpy() region
  libertas_tf: Use struct_group() for memcpy() region
  libertas: Use struct_group() for memcpy() region
  wlcore: no need to initialise statics to false
  rsi: Fix out-of-bounds read in rsi_read_pkt()
  rsi: Fix use-after-free in rsi_rx_done_handler()
  brcmfmac: Configure keep-alive packet on suspend
  wilc1000: remove '-Wunused-but-set-variable' warning in chip_wakeup()
  iwlwifi: mvm: read the rfkill state and feed it to iwlmei
  iwlwifi: mvm: add vendor commands needed for iwlmei
  iwlwifi: integrate with iwlmei
  iwlwifi: mei: add debugfs hooks
  iwlwifi: mei: add the driver to allow cooperation with CSME
  mei: bus: add client dma interface
  mwifiex: Ignore BTCOEX events from the 88W8897 firmware
  mwifiex: Ensure the version string from the firmware is 0-terminated
  mwifiex: Add quirk to disable deep sleep with certain hardware revision
  ...
====================

Link: https://lore.kernel.org/r/20211207144211.A9949C341C1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:01:18 -08:00
Eric Dumazet
f12bf6f3f9 net: watchdog: add net device refcount tracker
Add a netdevice_tracker inside struct net_device, to track
the self reference when a device has an active watchdog timer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
19c9ebf6ed vlan: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
08f0b22d73 net: eql: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Maxim Galaganov
6fadaa5658 tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelay
Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in
MPTCP setsockopt code -- namely for syncing MPTCP socket options with
subflows inside sync_socket_options() while already holding the subflow
socket lock.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Eric Dumazet
45cac67545 net: fix recent csum changes
Vladimir reported csum issues after my recent change in skb_postpull_rcsum()

Issue here is the following:

initial skb->csum is the csum of

[part to be pulled][rest of packet]

Old code:
 skb->csum = csum_sub(skb->csum, csum_partial(pull, pull_length, 0));

New code:
 skb->csum = ~csum_partial(pull, pull_length, ~skb->csum);

This is broken if the csum of [pulled part]
happens to be equal to skb->csum, because end
result of skb->csum is 0 in new code, instead
of being 0xffffffff

David Laight suggested to use

skb->csum = -csum_partial(pull, pull_length, -skb->csum);

I based my patches on existing code present in include/net/seg6.h,
update_csum_diff4() and update_csum_diff16() which might need
a similar fix.

I guess that my tests, mostly pulling 40 bytes of IPv6 header
were not providing enough entropy to hit this bug.

v2: added wsum_negate() to make sparse happy.

Fixes: 29c3002644 ("net: optimize skb_postpull_rcsum()")
Fixes: 0bd28476f6 ("gro: optimize skb_gro_postpull_rcsum()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Cc: David Lebrun <dlebrun@google.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211204045356.3659278-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:26:46 -08:00
Eric Dumazet
5fa5ae6058 netpoll: add net device refcount tracker to struct netpoll
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:06:02 -08:00
Eric Dumazet
42120a8643 ipmr, ip6mr: add net device refcount tracker to struct vif_device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:06:02 -08:00
Eric Dumazet
63f13937cb net: linkwatch: add net device refcount tracker
Add a netdevice_tracker inside struct net_device, to track
the self reference when a device is in lweventlist.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:44 -08:00
Eric Dumazet
c04438f58d ipv4: add net device refcount tracker to struct in_device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:11 -08:00
Eric Dumazet
9038c32000 net: dst: add net device refcount tracking to dst_entry
We want to track all dev_hold()/dev_put() to ease leak hunting.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:10 -08:00
Eric Dumazet
0b688f24b7 net: add net device refcount tracker to struct netdev_queue
This will help debugging pesky netdev reference leaks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:10 -08:00
Eric Dumazet
80e8921b2b net: add net device refcount tracker to struct netdev_rx_queue
This helps debugging net device refcount leaks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:10 -08:00
Eric Dumazet
4d92b95ff2 net: add net device refcount tracker infrastructure
net device are refcounted. Over the years we had numerous bugs
caused by imbalanced dev_hold() and dev_put() calls.

The general idea is to be able to precisely pair each decrement with
a corresponding prior increment. Both share a cookie, basically
a pointer to private data storing stack traces.

This patch adds dev_hold_track() and dev_put_track().

To use these helpers, each data structure owning a refcount
should also use a "netdevice_tracker" to pair the hold and put.

netdevice_tracker dev_tracker;
...
dev_hold_track(dev, &dev_tracker, GFP_ATOMIC);
...
dev_put_track(dev, &dev_tracker);

Whenever a leak happens, we will get precise stack traces
of the point dev_hold_track() happened, at device dismantle phase.

We will also get a stack trace if too many dev_put_track() for the same
netdevice_tracker are attempted.

This is guarded by CONFIG_NET_DEV_REFCNT_TRACKER option.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:05:07 -08:00
Eric Dumazet
4e66934eaa lib: add reference counting tracking infrastructure
It can be hard to track where references are taken and released.

In networking, we have annoying issues at device or netns dismantles,
and we had various proposals to ease root causing them.

This patch adds new infrastructure pairing refcount increases
and decreases. This will self document code, because programmers
will have to associate increments/decrements.

This is controled by CONFIG_REF_TRACKER which can be selected
by users of this feature.

This adds both cpu and memory costs, and thus should probably be
used with care.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06 16:04:44 -08:00
Linus Torvalds
b806bec538 regulator: Documentation fix for v5.16
A fix for bitrot in the documentation for protection interrupts that
 crept in as the code was revised during review.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmGuLzQACgkQJNaLcl1U
 h9CooQf/R65eO6RU0FqfEOP81kMHicfucJF4zcFiEJepGF8Do5ULx7eZpD+KdyLw
 zH2xPVqXlOCX9kEFgd/evYftZG3W9UMFBXUWVwNSl29hqWUkX+VuLZOK2LTpQzNO
 uH+UrYM1z4wg/6j8UrOVeROlQ28fvNwimNLWoNOy4yXJIiXLZpM04oA+2kNxyfmC
 6gG0XP3c/6zGQkqX+qKu9R3Ni5qZzqYESvgJ5d4aCZgTgU6SseDVxeA6awmbPcSf
 35d9T0tX3QNBRQT37WS0Rsdv2aVwcAuHiXwaBNm5hpcy1NrCzRKRxvIj3lQLUHmV
 5/ERi03h62tyNESyd2ik/Wve6I1Igg==
 =gA31
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "Documentation fix for v5.17.

  A fix for bitrot in the documentation for protection interrupts that
  crept in as the code was revised during review"

* tag 'regulator-fix-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: Update protection IRQ helper docs
2021-12-06 10:14:12 -08:00
Linus Torvalds
1d213767dc - Properly init uclamp_flags of a runqueue, on first enqueuing
- Fix preempt= callback return values
 
 - Correct utime/stime resource usage reporting on nohz_full to return
 the proper times instead of shorter ones
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmGspTcACgkQEsHwGGHe
 VUqHhBAAhEd9DMoJwREKDCDMqc3pttNYpTpSVo1K6oBTsOh7mwEilPdlmsTl239V
 jRocVJST+/JmJ424j7t0Sp42tREMKNlbyf+ddvr0oUwi0mLUnN6J83NU4WK4Jisf
 gyXFIkeMR+/W6/LO7gDdq/+rlRDtJcllwHoOm1yyiy5Zc0qDrcy6CjgP5/9hEsh6
 xvRvPOXbeZZVA+a+n+G9xGN836aBe1VptoABbdAlOSTiOvAVkS95UCb9rfPTvMtq
 /71jjZMmhTxGUhg5oLpgvfRRZE608X6b2RCTcAPKa5mfMpN5YMQLcD9G0f8XZjkq
 iOO/+arE6XQJlTzhAEsGxkSXaVweYxRHHP1yAlWYlWV/xGhoaAyq/tXE1KusAnng
 16/eTbrPb1eawpI6p1AAScCQuF/TlYZCMqjbFVhViXM5Rkd6jrii9vz/JnkdokGR
 3TH0n4WAJkdZeg18WS3B0eIt6zDTvxbR9g5ap2/10xYnYHMNdHXGH8A+5Grw9/Ln
 Qsv0V43OjdUK2tVuIHYblx1X9dOlLdpTEg9FCfjiZTQVor1pTwcbG62qNMozanlf
 lQqI6f63E0jugHqhrqrfBvl4lUuoajN5SvXfBNFDIzxwWBGSdr+hJQXstUatfSZZ
 MdmJX+Dk5cAk4CpQQ1ofPvYkS3Ade0vxaL4H++KHYtRvpPvxCXA=
 =XQFF
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Properly init uclamp_flags of a runqueue, on first enqueuing

 - Fix preempt= callback return values

 - Correct utime/stime resource usage reporting on nohz_full to return
   the proper times instead of shorter ones

* tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/uclamp: Fix rq->uclamp_max not set on first enqueue
  preempt/dynamic: Fix setup_preempt_mode() return value
  sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
2021-12-05 08:53:31 -08:00