Commit graph

9009 commits

Author SHA1 Message Date
Jakub Kicinski
24a790da0a mlx5 subfunction support
Parav Pandit Says:
 =================
 
 This patchset introduces support for mlx5 subfunction (SF).
 
 A subfunction is a lightweight function that has a parent PCI function on
 which it is deployed. mlx5 subfunction has its own function capabilities
 and its own resources. This means a subfunction has its own dedicated
 queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from
 the parent PCI function.
 
 When subfunction is RDMA capable, it has its own QP1, GID table and rdma
 resources neither shared nor stolen from the parent PCI function.
 
 A subfunction has dedicated window in PCI BAR space that is not shared
 with the other subfunctions or parent PCI function. This ensures that all
 class devices of the subfunction accesses only assigned PCI BAR space.
 
 A Subfunction supports eswitch representation through which it supports tc
 offloads. User must configure eswitch to send/receive packets from/to
 subfunction port.
 
 Subfunctions share PCI level resources such as PCI MSI-X IRQs with
 their other subfunctions and/or with its parent PCI function.
 
 Patch summary:
 --------------
 Patch 1 to 4 prepares devlink
 patch 5 to 7 mlx5 adds SF device support
 Patch 8 to 11 mlx5 adds SF devlink port support
 Patch 12 and 14 adds documentation
 
 Patch-1 prepares code to handle multiple port function attributes
 Patch-2 introduces devlink pcisf port flavour similar to pcipf and pcivf
 Patch-3 adds port add and delete driver callbacks
 Patch-4 adds port function state get and set callbacks
 Patch-5 mlx5 vhca event notifier support to distribute subfunction
         state change notification
 Patch-6 adds SF auxiliary device
 Patch-7 adds SF auxiliary driver
 Patch-8 prepares eswitch to handler SF vport
 Patch-9 adds eswitch helpers to add/remove SF vport
 Patch-10 implements devlink port add/del callbacks
 Patch-11 implements devlink port function get/set callbacks
 Patch-12 to 14 adds documentation
 Patch-12 added mlx5 port function documentation
 Patch-13 adds subfunction documentation
 Patch-14 adds mlx5 subfunction documentation
 
 Subfunction support is discussed in detail in RFC [1] and [2].
 RFC [1] and extension [2] describes requirements, design and proposed
 plumbing using devlink, auxiliary bus and sysfs for systemd/udev
 support. Functionality of this patchset is best explained using real
 examples further below.
 
 overview:
 --------
 A subfunction can be created and deleted by a user using devlink port
 add/delete interface.
 
 A subfunction can be configured using devlink port function attribute
 before its activated.
 
 When a subfunction is activated, it results in an auxiliary device on
 the host PCI device where it is deployed. A driver binds to the
 auxiliary device that further creates supported class devices.
 
 example subfunction usage sequence:
 -----------------------------------
 Change device to switchdev mode:
 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
 
 Add a devlink port of subfunction flavour:
 $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
 
 Configure mac address of the port function:
 $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88
 
 Now activate the function:
 $ devlink port function set ens2f0npf0sf88 state active
 
 Now use the auxiliary device and class devices:
 $ devlink dev show
 pci/0000:06:00.0
 auxiliary/mlx5_core.sf.4
 
 $ ip link show
 127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
     altname enp6s0f0np0
 129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff
 
 $ rdma dev show
 43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
 44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
 
 After use inactivate the function:
 $ devlink port function set ens2f0npf0sf88 state inactive
 
 Now delete the subfunction port:
 $ devlink port del ens2f0npf0sf88
 
 [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
 [2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2
 
 =================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmALKDwACgkQSD+KveBX
 +j7qjQf6A1moPhhIlXROCzaJUjlAj2U291LWBveU+I6na6fjYjAAWHYwfv0YKQpo
 Qb0NRt+9abgEpGidc4hOwIJKhK+vlWrQuehRt83aAfAwaN3OEeGuNllniWo821Hj
 sNiJfSC/DslOlQSxKLsAs3Fduy/sV3GN9Zv7hEwOFgEr5QvB2c6H1XiypVP2Ecsd
 ZXC3SuEWxIoRtfXEkTkJne9LNoiDChlvT1FR/z75h8HUBdAOjzBTQzBbM+8M4Msw
 8aKUPya3FMRAPWsOgPhkpU0xTtH2Mi7MC9TlwiWmrK4Q3uvesIav8pVf7r3GNAZA
 sipIZ4gP0M5SiCaZa8rIBpTXBHxmvg==
 =jEG4
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 subfunction support

Parav Pandit says:

This patchset introduces support for mlx5 subfunction (SF).

A subfunction is a lightweight function that has a parent PCI function on
which it is deployed. mlx5 subfunction has its own function capabilities
and its own resources. This means a subfunction has its own dedicated
queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from
the parent PCI function.

When subfunction is RDMA capable, it has its own QP1, GID table and rdma
resources neither shared nor stolen from the parent PCI function.

A subfunction has dedicated window in PCI BAR space that is not shared
with the other subfunctions or parent PCI function. This ensures that all
class devices of the subfunction accesses only assigned PCI BAR space.

A Subfunction supports eswitch representation through which it supports tc
offloads. User must configure eswitch to send/receive packets from/to
subfunction port.

Subfunctions share PCI level resources such as PCI MSI-X IRQs with
their other subfunctions and/or with its parent PCI function.

Subfunction support is discussed in detail in RFC [1] and [2].
RFC [1] and extension [2] describes requirements, design and proposed
plumbing using devlink, auxiliary bus and sysfs for systemd/udev
support. Functionality of this patchset is best explained using real
examples further below.

overview:
--------
A subfunction can be created and deleted by a user using devlink port
add/delete interface.

A subfunction can be configured using devlink port function attribute
before its activated.

When a subfunction is activated, it results in an auxiliary device on
the host PCI device where it is deployed. A driver binds to the
auxiliary device that further creates supported class devices.

example subfunction usage sequence:
-----------------------------------
Change device to switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Add a devlink port of subfunction flavour:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

Configure mac address of the port function:
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88

Now activate the function:
$ devlink port function set ens2f0npf0sf88 state active

Now use the auxiliary device and class devices:
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ ip link show
127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
    altname enp6s0f0np0
129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff

$ rdma dev show
43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

After use inactivate the function:
$ devlink port function set ens2f0npf0sf88 state inactive

Now delete the subfunction port:
$ devlink port del ens2f0npf0sf88

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

=================

* tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Add devlink subfunction port documentation
  devlink: Extend devlink port documentation for subfunctions
  devlink: Add devlink port documentation
  net/mlx5: SF, Port function state change support
  net/mlx5: SF, Add port add delete functionality
  net/mlx5: E-switch, Add eswitch helpers for SF vport
  net/mlx5: E-switch, Prepare eswitch to handle SF vport
  net/mlx5: SF, Add auxiliary device driver
  net/mlx5: SF, Add auxiliary device support
  net/mlx5: Introduce vhca state event notifier
  devlink: Support get and set state of port function
  devlink: Support add and delete devlink port
  devlink: Introduce PCI SF port flavour and port attribute
  devlink: Prepare code to fill multiple port function attributes
====================

Link: https://lore.kernel.org/r/20210122193658.282884-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 16:57:19 -08:00
Linus Torvalds
909b447dcc Networking fixes for 5.11-rc6, including fixes from can, xfrm, wireless,
wireless-drivers and netfilter trees. Nothing scary, Intel WiFi-related
 fixes seemed most notable to the users.
 
 Current release - regressions:
 
  - dsa: microchip: ksz8795: fix KSZ8794 port map again to program
                             the CPU port correctly
 
 Current release - new code bugs:
 
  - iwlwifi: pcie: reschedule in long-running memory reads
 
 Previous releases - regressions:
 
  - iwlwifi: dbg: don't try to overwrite read-only FW data
 
  - iwlwifi: provide gso_type to GSO packets
 
  - octeontx2: make sure the buffer is 128 byte aligned
 
  - tcp: make TCP_USER_TIMEOUT accurate for zero window probes
 
  - xfrm: fix wraparound in xfrm_policy_addr_delta()
 
  - xfrm: fix oops in xfrm_replay_advance_bmp due to a race between CPUs
          in presence of packet reorder
 
  - tcp: fix TLP timer not set when CA_STATE changes from DISORDER
         to OPEN
 
  - wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
 
 Previous releases - always broken:
 
  - igc: fix link speed advertising
 
  - stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing
 
  - team: protect features update by RCU to avoid deadlock
 
  - xfrm: fix disable_xfrm sysctl when used on xfrm interfaces themselves
 
  - fec: fix temporary RMII clock reset on link up
 
  - can: dev: prevent potential information leak in can_fill_info()
 
 Misc:
 
  - mrp: fix bad packing of MRP test packet structures
 
  - uapi: fix big endian definition of ipv6_rpl_sr_hdr
 
  - add David Ahern to IPv4/IPv6 maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmATRs4ACgkQMUZtbf5S
 IrtOfQ//Vmn1WprrwLPf6/uOuBN0RAKHC+64IRIw2ahDuiB1QQV0c3ALRd42Xp8n
 qnoDMB/mUWdF/KjjJEKvwYyBuwBeQWLcpgTXi1HvvhxM13PVHjvyIp6hTAYYj+m4
 KyWWzQZwezz0zKQ3wXFdZV4JuefXEgXvMx65o8nk+TsutHn6WK/E6ZnWTexoZ0pa
 5Lab149mtoCdSpT3gr2x1aTqd9KYWaxfarYOUD1GY58BQyDFl4wj10MV3oE7xWPj
 /MKnSBvPx52ajbb+rUVhfFjBN1BmEjdze7cBMncJc5H+0X38R23ZaAlP3gecGaac
 hZ5C2wnSSvRR8KIvSEwbCArlpuyU+exacZXZ0vS6sfgqISKqoPv8erWvpxtLil3v
 YfwZVNPYG9RBwbnDVw1gLQIFn3lUqLhIPnJ8J2Ue6KUm7ur4fO566RjyPU3gkPdp
 5Zj3Eh7hsB2EqOy4RdwnoI0QboWmlq9+wT11HCXPFyJ077JzVU0FzMSvJr4dgVSI
 3D3ckmw+RSej4ib6G4xjpq1tPCFzdf9zlFoUPomRFTKgfJFaky5pEb/22C3bztp1
 43fsv3PiwlQtoYP3pfQsRj+r6DikYwDL7A3lskWohIZXviY2wErKWViUcIXr5ULE
 BxYQq0NYMl4TgDkn525U9EFwVgJAvPAedhYxF7VKn3eHNODqWBo=
 =dwFD
 -----END PGP SIGNATURE-----

Merge tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes including fixes from can, xfrm, wireless,
  wireless-drivers and netfilter trees. Nothing scary, Intel
  WiFi-related fixes seemed most notable to the users.

  Current release - regressions:

   - dsa: microchip: ksz8795: fix KSZ8794 port map again to program the
     CPU port correctly

  Current release - new code bugs:

   - iwlwifi: pcie: reschedule in long-running memory reads

  Previous releases - regressions:

   - iwlwifi: dbg: don't try to overwrite read-only FW data

   - iwlwifi: provide gso_type to GSO packets

   - octeontx2: make sure the buffer is 128 byte aligned

   - tcp: make TCP_USER_TIMEOUT accurate for zero window probes

   - xfrm: fix wraparound in xfrm_policy_addr_delta()

   - xfrm: fix oops in xfrm_replay_advance_bmp due to a race between
     CPUs in presence of packet reorder

   - tcp: fix TLP timer not set when CA_STATE changes from DISORDER to
     OPEN

   - wext: fix NULL-ptr-dereference with cfg80211's lack of commit()

  Previous releases - always broken:

   - igc: fix link speed advertising

   - stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA
     addressing

   - team: protect features update by RCU to avoid deadlock

   - xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
     themselves

   - fec: fix temporary RMII clock reset on link up

   - can: dev: prevent potential information leak in can_fill_info()

  Misc:

   - mrp: fix bad packing of MRP test packet structures

   - uapi: fix big endian definition of ipv6_rpl_sr_hdr

   - add David Ahern to IPv4/IPv6 maintainers"

* tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  rxrpc: Fix memory leak in rxrpc_lookup_local
  mlxsw: spectrum_span: Do not overwrite policer configuration
  selftests: forwarding: Specify interface when invoking mausezahn
  stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing
  net: usb: cdc_ether: added support for Thales Cinterion PLSx3 modem family.
  ibmvnic: Ensure that CRQ entry read are correctly ordered
  MAINTAINERS: add missing header for bonding
  net: decnet: fix netdev refcount leaking on error path
  net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP
  can: dev: prevent potential information leak in can_fill_info()
  net: fec: Fix temporary RMII clock reset on link up
  net: lapb: Add locking to the lapb module
  team: protect features update by RCU to avoid deadlock
  MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers
  net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
  net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
  net/mlx5e: Revert parameters on errors when changing trust state without reset
  net/mlx5e: Correctly handle changing the number of queues when the interface is down
  net/mlx5e: Fix CT rule + encap slow path offload and deletion
  net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
  ...
2021-01-28 15:24:43 -08:00
Linus Torvalds
fc856f1df7 media fixes for v5.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmASjOAACgkQCF8+vY7k
 4RULWBAAhW4Pf4IrLqb6VLWs7IALGmcslBmMK0OVa/vyZNXPOOF8HYgl4ls6O62j
 qS0Wbwkp50M9d62fEOmvhsflYgdm6Eofpku0BM+pZKqqG7caI38C4xZyBkUcW11s
 c/Y/8CpOpYVfjZ/NMmfhgBAiCG+7ByZQB9FWGhpNkpxaQwfLnJD/+8jDl7CBLCv2
 +p1ydBhTE7PmqnNYWKK167Y54BkHQ/PsSnwv+y9sM62qFGG+qh8xucc1ZU9ubZNf
 IJi0Zo8fPoxiOZXMKvnDzA+miUeYLYdbSXUPtYrsDqtSzvvdbJ0wXxDUjIGnIvg0
 8J7M8RpeZIjoVHua0kbZtJlQnDpX23I4EcE9YFa3JAkKUxMA1brBf2k7UdnO0+2G
 VL1Xp2iw9C43lBy2qm0V7mJeG0tMXOrJu6xVkXlRY4NdOx2WceHACVNtGhNRGjjm
 4t5FM1locj8f1e2B7f0Q2fYUI6b4jfFmI+cPzPhKB/78ons97EAcfVcKveaSuRrc
 uB47xFxTIhyI/djtQ2/wGfJe2jvclpZ6gN/mNXqyjdvo6Gqsc7zJt8rBpTezVFA5
 u3oP0TUIa7D0MzlJWuHYMCJ6tH7ToKbr0IXBJ6Kl5u116QCWNOm/PC5+hTEEl9eX
 wXl4tLVKF+odEvkdN6lS/AU9zoT0fAvLg9LKP5rWBrY7rhuxmYk=
 =i9ft
 -----END PGP SIGNATURE-----

Merge tag 'media/v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - a V4L2 core regression at videobuf2 when checking for single-plane
   dmabuf

 - a change at uAPI header v4l2-subdev.h, fixing a breakage as BIT()
   macro is not available in userspace

 - fix some regressions at RC core due to the usage of microseconds
   everywhere on it

 - a fix for a race condition at RC core

 - a rename on a newly-introduced kAPI symbol (v4l2_get_link_rate),
   currently used only by a single driver

 - Regression fixes for rcar-vin, cedrus, ite-cir, hantro, css, venus,
   and cec drivers.

* tag 'media/v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: hantro: Fix reset_raw_fmt initialization
  media: cec: add stm32 driver
  media: cedrus: Fix H264 decoding
  media: v4l2-subdev.h: BIT() is not available in userspace
  media: Revert "media: videobuf2: Fix length check for single plane dmabuf queueing"
  media: rc: ite-cir: fix min_timeout calculation
  media: venus: core: Fix platform driver shutdown
  media: rc: fix timeout handling after switch to microsecond durations
  media: v4l: common: Fix naming of v4l2_get_link_rate
  media: rcar-vin: fix return, use ret instead of zero
  media: ccs: Get static data version minor correctly
  media: ccs-pll: Fix link frequency for C-PHY
  media: rc: ensure that uevent can be read directly after rc device register
2021-01-28 09:18:05 -08:00
Heiko Stuebner
ef357e02b6 media: rockchip: rkisp1: extend uapi array sizes
Later variants of the rkisp1 block use more entries in some arrays:

RKISP1_CIF_ISP_AE_MEAN_MAX                 25 -> 81
RKISP1_CIF_ISP_HIST_BIN_N_MAX              16 -> 32
RKISP1_CIF_ISP_GAMMA_OUT_MAX_SAMPLES       17 -> 34
RKISP1_CIF_ISP_HISTOGRAM_WEIGHT_GRIDS_SIZE 25 -> 81

and we can still extend the uapi during the 5.11-rc cycle, so do that
now to be on the safe side.

V10 and V11 only need the smaller sizes, while V12 and V13 needed
the larger sizes.

When adding the bigger sizes make sure, values filled from hardware
values and transmitted to userspace don't leak kernel data by zeroing
them beforehand.

Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-28 11:31:43 +01:00
Heiko Stuebner
fc672d806b media: rockchip: rkisp1: carry ip version information
The IP block evolved from its rk3288/rk3399 base and the vendor
designates them with a numerical version. rk3399 for example
is designated V10 probably meaning V1.0.

There doesn't seem to be an actual version register we could read that
information from, so allow the match_data to carry that information
for future differentiation.

Also carry that information in the hw_revision field of the media-
controller API, so that userspace also has access to that.

The added versions are:
- V10: at least rk3288 + rk3399
- V11: seemingly unused as of now, but probably appeared in some soc
- V12: at least rk3326 + px30
- V13: at least rk1808

[fix checkpatch warning don't use multiple blank lines]

Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-28 11:31:08 +01:00
Heiko Stuebner
66d81de7ea media: rockchip: rkisp1: reduce number of histogram grid elements in uapi
The uapi right now specifies an array size of 28 but the actual number
of elements is only 25 with the last 3 being unused.

Reduce the array size to the correct number of elements and change
the params code to iterate the array 25 times.

Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-28 11:30:39 +01:00
Dafna Hirschfeld
31f190e0cc media: rkisp1: uapi: change hist_bins array type from __u16 to __u32
Each entry in the array is a 20 bits value composed of 16 bits unsigned
integer and 4 bits fractional part. So the type should change to __u32.
In addition add a documentation of how the measurements are done.

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-28 11:29:23 +01:00
Wolfram Sang
2e7f3db5d8 Linux 5.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmAOFRIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG4lcH/jFAG+AXR6T2K3dG
 adJmdAWQKdDj71Ruoyefx5GSbD0B+rM1UcxOQgmZem2127RJgfDBBVYyjSwh7H1F
 LRJdN6+NKReL2OW/HMnufvpVMvWICTWHInyvsqKqp+Yo4GC5z46OtsmzSayYgTaP
 v2G9auC/rSUlzmQux9EAbTKDeNGMum3dt2rfsmnIDymBSHoPbizXF93/2hoo9jwx
 BZQCq0nYBCa9pEdsmgvqKkb/Y5+CSFoe02qdUZFIuwh6qd/XPfeDPSu4O3z4J9g6
 x8G+f7doeU0nzsAP+K0lG7ulAygxRHm5Bn2rwVdlYfihAuPNyF+Ua3duxBvjOBEY
 7w/UQ64=
 =ZWS9
 -----END PGP SIGNATURE-----

Merge tag 'v5.11-rc5' into i2c/for-5.12

Linux 5.11-rc5
2021-01-28 10:03:25 +01:00
Nikolay Aleksandrov
2dba407f99 net: bridge: multicast: make tracked EHT hosts limit configurable
Add two new port attributes which make EHT hosts limit configurable and
export the current number of tracked EHT hosts:
 - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit
 - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts
Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed.

Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've
increased it to 40 to have space for two more future entries.

v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c,
    no functional change

Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-27 17:40:35 -08:00
Ioana Ciornei
2cf1e703f0 bus: fsl-mc: add fsl-mc userspace support
Adding userspace support for the MC (Management Complex) means exporting
an ioctl capable device file representing the root resource container.

This new functionality in the fsl-mc bus driver intends to provide
userspace applications an interface to interact with the MC firmware.

Commands that are composed in userspace are sent to the MC firmware
through the FSL_MC_SEND_MC_COMMAND ioctl.  By default the implicit MC
I/O portal is used for this operation, but if the implicit one is busy,
a dynamic portal is allocated and then freed upon execution.

The command received through the ioctl interface is checked against a
known whitelist of accepted MC commands. Commands that attempt a change
in hardware configuration will need CAP_NET_ADMIN, while commands used
in debugging do not need it.

Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210114170752.2927915-4-ciorneiioana@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 15:13:52 +01:00
Ioana Ciornei
8544717cda bus: fsl-mc: move fsl_mc_command struct in a uapi header
Define "struct fsl_mc_command" as a structure that can cross the
user/kernel boundary.

Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210114170752.2927915-2-ciorneiioana@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 15:13:52 +01:00
Praveen Chaudhary
6b2e04bc24 net: allow user to set metric on default route learned via Router Advertisement
For IPv4, default route is learned via DHCPv4 and user is allowed to change
metric using config etc/network/interfaces. But for IPv6, default route can
be learned via RA, for which, currently a fixed metric value 1024 is used.

Ideally, user should be able to configure metric on default route for IPv6
similar to IPv4. This patch adds sysctl for the same.

Logs:

For IPv4:

Config in etc/network/interfaces:
auto eth0
iface eth0 inet dhcp
    metric 4261413864

IPv4 Kernel Route Table:
$ ip route list
default via 172.21.47.1 dev eth0 metric 4261413864

FRR Table, if a static route is configured:
[In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.]
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       > - selected route, * - FIB route

S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m

i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
Similar behavior is not possible for IPv6, without this fix.

After fix [for IPv6]:
sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705

IP monitor: [When IPv6 RA is received]
default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705  pref high

Kernel IPv6 routing table
$ ip -6 route list
default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high

FRR Table, if a static route is configured:
[In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.]
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       > - selected route, * - FIB route

S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
K   ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m

If the metric is changed later, the effect will be seen only when next IPv6
RA is received, because the default route must be fully controlled by RA msg.
Below metric is changed from 1996489705 to 1996489704.

$ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704

IP monitor:
[On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric]

Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high
default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26 18:39:45 -08:00
Hans Verkuil
a53e3c189c media: v4l2-subdev.h: BIT() is not available in userspace
The BIT macro is not available in userspace, so replace BIT(0) by
0x00000001.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 6446ec6cbf ("media: v4l2-subdev: add VIDIOC_SUBDEV_QUERYCAP ioctl")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-26 19:14:33 +01:00
Justin Iurman
07d46d93c9 uapi: fix big endian definition of ipv6_rpl_sr_hdr
Following RFC 6554 [1], the current order of fields is wrong for big
endian definition. Indeed, here is how the header looks like:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE |  Pad  |               Reserved                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This patch reorders fields so that big endian definition is now correct.

  [1] https://tools.ietf.org/html/rfc6554#section-3

Fixes: cfa933d938 ("include: uapi: linux: add rpl sr header definition")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-25 15:14:16 -08:00
Scott Branden
8822276264 bcm-vk: add bcm_vk UAPI
Add user space api for bcm-vk driver.

Provide ioctl api to load images and issue reset command to card.
FW status registers in PCI BAR space also defined as part
of API so that user space is able to interpret these memory locations
as needed via direct PCIe access.

Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Link: https://lore.kernel.org/r/20210120175827.14820-2-scott.branden@broadcom.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-25 18:44:44 +01:00
Chuck Lever
9cde9360d1 NFSD: Update the SETATTR3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Greg Kroah-Hartman
0f8b29faba Merge 5.11-rc5 into tty-next
We need the fixes in here and this resolves a merge issue in
drivers/tty/tty_io.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-25 11:19:46 +01:00
Christian Brauner
9caccd4154
fs: introduce MOUNT_ATTR_IDMAP
Introduce a new mount bind mount property to allow idmapping mounts. The
MOUNT_ATTR_IDMAP flag can be set via the new mount_setattr() syscall
together with a file descriptor referring to a user namespace.

The user namespace referenced by the namespace file descriptor will be
attached to the bind mount. All interactions with the filesystem going
through that mount will be mapped according to the mapping specified in
the user namespace attached to it.

Using user namespaces to mark mounts means we can reuse all the existing
infrastructure in the kernel that already exists to handle idmappings
and can also use this for permission checking to allow unprivileged user
to create idmapped mounts in the future.

Idmapping a mount is decoupled from the caller's user and mount
namespace. This means idmapped mounts can be created in the initial
user namespace which is an important use-case for systemd-homed,
portable usb-sticks between systems, sharing data between the initial
user namespace and unprivileged containers, and other use-cases that
have been brought up. For example, assume a home directory where all
files are owned by uid and gid 1000 and the home directory is brought to
a new laptop where the user has id 12345. The system administrator can
simply create a mount of this home directory with a mapping of
1000:12345:1 and other mappings to indicate the ids should be kept.
(With this it is e.g. also possible to create idmapped mounts on the
host with an identity mapping 1:1:100000 where the root user is not
mapped. A user with root access that e.g. has been pivot rooted into
such a mount on the host will be not be able to execute, read, write, or
create files as root.)

Given that mapping a mount is decoupled from the caller's user namespace
a sufficiently privileged process such as a container manager can set up
an idmapped mount for the container and the container can simply pivot
root to it. There's no need for the container to do anything. The mount
will appear correctly mapped independent of the user namespace the
container uses. This means we don't need to mark a mount as idmappable.

In order to create an idmapped mount the caller must currently be
privileged in the user namespace of the superblock the mount belongs to.
Once a mount has been idmapped we don't allow it to change its mapping.
This keeps permission checking and life-cycle management simple. Users
wanting to change the idmapped can always create a new detached mount
with a different idmapping.

Link: https://lore.kernel.org/r/20210121131959.646623-36-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Mauricio Vásquez Bernal <mauricio@kinvolk.io>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:43:45 +01:00
Christian Brauner
2a1867219c
fs: add mount_setattr()
This implements the missing mount_setattr() syscall. While the new mount
api allows to change the properties of a superblock there is currently
no way to change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on. In addition the old
mount api has the restriction that mount options cannot be applied
recursively. This hasn't changed since changing mount options on a
per-mount basis was implemented in [1] and has been a frequent request
not just for convenience but also for security reasons. The legacy
mount syscall is unable to accommodate this behavior without introducing
a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND |
MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost
mount. Changing MS_REC to apply to the whole mount tree would mean
introducing a significant uapi change and would likely cause significant
regressions.

The new mount_setattr() syscall allows to recursively clear and set
mount options in one shot. Multiple calls to change mount options
requesting the same changes are idempotent:

int mount_setattr(int dfd, const char *path, unsigned flags,
                  struct mount_attr *uattr, size_t usize);

Flags to modify path resolution behavior are specified in the @flags
argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW,
and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to
restrict path resolution as introduced with openat2() might be supported
in the future.

The mount_setattr() syscall can be expected to grow over time and is
designed with extensibility in mind. It follows the extensible syscall
pattern we have used with other syscalls such as openat2(), clone3(),
sched_{set,get}attr(), and others.
The set of mount options is passed in the uapi struct mount_attr which
currently has the following layout:

struct mount_attr {
	__u64 attr_set;
	__u64 attr_clr;
	__u64 propagation;
	__u64 userns_fd;
};

The @attr_set and @attr_clr members are used to clear and set mount
options. This way a user can e.g. request that a set of flags is to be
raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in
@attr_set while at the same time requesting that another set of flags is
to be lowered such as removing noexec from a mount tree by specifying
MOUNT_ATTR_NOEXEC in @attr_clr.

Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0,
not a bitmap, users wanting to transition to a different atime setting
cannot simply specify the atime setting in @attr_set, but must also
specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that
MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set
can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in
@attr_clr.

The @propagation field lets callers specify the propagation type of a
mount tree. Propagation is a single property that has four different
settings and as such is not really a flag argument but an enum.
Specifically, it would be unclear what setting and clearing propagation
settings in combination would amount to. The legacy mount() syscall thus
forbids the combination of multiple propagation settings too. The goal
is to keep the semantics of mount propagation somewhat simple as they
are overly complex as it is.

The @userns_fd field lets user specify a user namespace whose idmapping
becomes the idmapping of the mount. This is implemented and explained in
detail in the next patch.

[1]: commit 2e4b7fcd92 ("[PATCH] r/o bind mounts: honor mount writer counts at remount")

Link: https://lore.kernel.org/r/20210121131959.646623-35-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:42:45 +01:00
Rasmus Villemoes
6781939054 net: mrp: move struct definitions out of uapi
None of these are actually used in the kernel/userspace interface -
there's a userspace component of implementing MRP, and userspace will
need to construct certain frames to put on the wire, but there's no
reason the kernel should provide the relevant definitions in a UAPI
header.

In fact, some of those definitions were broken until previous commit,
so only keep the few that are actually referenced in the kernel code,
and move them to the br_private_mrp.h header.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23 12:38:42 -08:00
Rasmus Villemoes
dc090de854 net: mrp: fix definitions of MRP test packets
Wireshark says that the MRP test packets cannot be decoded - and the
reason for that is that there's a two-byte hole filled with garbage
between the "transitions" and "timestamp" members.

So Wireshark decodes the two garbage bytes and the top two bytes of
the timestamp written by the kernel as the timestamp value (which thus
fluctuates wildly), and interprets the lower two bytes of the
timestamp as a new (type, length) pair, which is of course broken.

Even though this makes the timestamp field in the struct unaligned, it
actually makes it end up on a 32 bit boundary in the frame as mandated
by the standard, since it is preceded by a two byte TLV header.

The struct definitions live under include/uapi/, but they are not
really part of any kernel<->userspace API/ABI, so fixing the
definitions by adding the packed attribute should not cause any
compatibility issues.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23 12:34:20 -08:00
Maxim Mikityanskiy
d03b195b5a sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.

In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.

After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.

ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.

The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.

This solution addresses two main problems of scaling HTB:

1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:

    # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
    classid 1:10

It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.

Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.

2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.

Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 20:41:29 -08:00
Arjun Roy
7eeba1706e tcp: Add receive timestamp support for receive zerocopy.
tcp_recvmsg() uses the CMSG mechanism to receive control information
like packet receive timestamps. This patch adds CMSG fields to
struct tcp_zerocopy_receive, and provides receive timestamps
if available to the user.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 20:05:56 -08:00
Yousuk Seung
e7ed11ee94 tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 18:20:52 -08:00
Parav Pandit
a556dded9c devlink: Support get and set state of port function
devlink port function can be in active or inactive state.
Allow users to get and set port function's state.

When the port function it activated, its operational state may change
after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the port function and its device's
operational state in the host system, define state and opstate
attributes.

Example of a PCI SF port which supports a port function:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-22 11:32:08 -08:00
Parav Pandit
b8288837ef devlink: Introduce PCI SF port flavour and port attribute
A PCI sub-function (SF) represents a portion of the device similar
to PCI VF.

In an eswitch, PCI SF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with SF,
and its representor netdevice, introduce a PCI SF port flavour.

When devlink port flavour is PCI SF, fill up PCI SF attributes of the
port.

Extend port name creation using PCI PF and SF number scheme on best
effort basis, so that vendor drivers can skip defining their own
scheme.
This is done as cApfNSfM, where A, N and M are controller, PCI PF and
PCI SF number respectively.
This is similar to existing naming for PCI PF and PCI VF ports.

An example view of a PCI SF port:

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state active opstate attached

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-01-22 11:32:07 -08:00
Wolfram Sang
21500aa840 i2c: uapi: add macro to describe support for all SMBus transfers
Some I2C bus master drivers which support I2C_M_RECV_LEN do not set
the functionality bits of the now supported SMBus transfers. Add a
convenience macro to make this very simple.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-22 09:59:00 +01:00
Wolfram Sang
1713d66cae i2c: remove licence boilerplate from i2c-dev UAPI header
Remove boilerplate because we now have the SPDX header.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-22 09:58:57 +01:00
Wolfram Sang
deb0d3351b i2c: remove licence boilerplate from main UAPI header
Remove boilerplate because we now have the SPDX header.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-22 09:58:53 +01:00
Wolfram Sang
bfb3939c51 i2c: refactor documentation of struct i2c_msg
The information about 'i2c_msg' was spread between kdoc and comments.
Move all the explanations to kdoc and duplicate only the requirements
for the flags in the comments. Also, add some redundancy and fix some
typos while here.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-01-22 09:58:50 +01:00
wenxu
7baf2429a1 net/sched: cls_flower add CT_FLAGS_INVALID flag support
This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to
match the ct_state with invalid for conntrack.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20 21:09:44 -08:00
Jarod Wilson
7b8fc0103b bonding: add a vlan+srcmac tx hashing option
This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Thomas Davis <tadavis@lbl.gov>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19 19:30:32 -08:00
Thinh Nguyen
f2fc9ff28d usb: ch9: Add USB 3.2 SSP attributes
In preparation for USB 3.2 dual-lane support, add sublink speed
attribute macros and enum usb_ssp_rate. A USB device that operates in
SuperSpeed Plus may operate at different speed and lane count. These
additional macros and enum values help specifying that.

Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/ae9293ebd63a29f2a2035054753534d9eb123d74.1610592135.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-18 18:38:47 +01:00
Alexandre Belloni
7ae41220ef rtc: introduce features bitfield
Introduce a bitfield to allow the drivers to announce the available
features for an RTC.

The main use case would be to better handle alarms, that could be present
or not or have a minute resolution or may need a correct week day to be set.

Use the newly introduced RTC_FEATURE_ALARM bit to then test whether alarms
are available instead of relying on the presence of ops->set_alarm.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210110231752.1418816-2-alexandre.belloni@bootlin.com
2021-01-16 23:19:26 +01:00
Pravin B Shelar
9ab7e76aef GTP: add support for flow based tunneling API
Following patch add support for flow based tunneling API
to send and recv GTP tunnel packet over tunnel metadata API.
This would allow this device integration with OVS or eBPF using
flow based tunneling APIs.

Signed-off-by: Pravin B Shelar <pbshelar@fb.com>
Link: https://lore.kernel.org/r/20210110070021.26822-1-pbshelar@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:31:49 -08:00
Jakub Kicinski
2d9116be76 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-01-16

1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
   that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.

2) Add support for using kernel module global variables (__ksym externs in BPF
   programs) retrieved via module's BTF, from Andrii Nakryiko.

3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
   stored in mmap2 event for perf, from Jiri Olsa.

4) Various fixes for cross-building BPF sefltests out-of-tree which then will
   unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.

5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.

6) Clean up driver's XDP buffer init and split into two helpers to init per-
   descriptor and non-changing fields during processing, from Lorenzo Bianconi.

7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
  perf: Add build id data in mmap2 event
  bpf: Add size arg to build_id_parse function
  bpf: Move stack_map_get_build_id into lib
  bpf: Document new atomic instructions
  bpf: Add tests for new BPF atomic operations
  bpf: Add bitwise atomic instructions
  bpf: Pull out a macro for interpreting atomic ALU operations
  bpf: Add instructions for atomic_[cmp]xchg
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Move BPF_STX reserved field check into BPF_STX verifier code
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: x86: Factor out a lookup table for some ALU opcodes
  bpf: x86: Factor out emission of REX byte
  bpf: x86: Factor out emission of ModR/M for *(reg + off)
  tools/bpftool: Add -Wall when building BPF programs
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
  selftests/bpf: Install btf_dump test cases
  selftests/bpf: Fix installation of urandom_read
  selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
  selftests/bpf: Fix out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:57:26 -08:00
Uwe Kleine-König
429b29aef7 tty: serial: Drop unused efm32 serial driver
Support for this machine was just removed, so drop the now unused UART
driver, too.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20210115155130.185010-7-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-15 17:14:49 +01:00
Jiri Olsa
88a16a1309 perf: Add build id data in mmap2 event
Adding support to carry build id data in mmap2 event.

The build id data replaces maj/min/ino/ino_generation
fields, which are also used to identify map's binary,
so it's ok to replace them with build id data:

  union {
          struct {
                  u32       maj;
                  u32       min;
                  u64       ino;
                  u64       ino_generation;
          };
          struct {
                  u8        build_id_size;
                  u8        __reserved_1;
                  u16       __reserved_2;
                  u8        build_id[20];
          };
  };

Replaced maj/min/ino/ino_generation fields give us size
of 24 bytes. We use 20 bytes for build id data, 1 byte
for size and rest is unused.

There's new misc bit for mmap2 to signal there's build
id data in it:

  #define PERF_RECORD_MISC_MMAP_BUILD_ID   (1 << 14)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-4-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Jakub Kicinski
1d9f03c0a1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14 18:34:50 -08:00
Brendan Jackman
5ffa25502b bpf: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
5ca419f286 bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
91c960b005 bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Dikshita Agarwal
6bde70da98 media: v4l2-ctrl: Add base layer priority id control.
This control indicates the priority id to be applied
to base layer.

[hverkuil: renumbered V4L2_CID_MPEG_VIDEO_BASELAYER_PRIORITY_ID]

Signed-off-by: Dikshita Agarwal <dikshita@codeaurora.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-14 13:54:00 +01:00
Dikshita Agarwal
4ca134ee98 media: v4l2-ctrl: Add layer wise bitrate controls for h264
Adds bitrate control for all coding layers for h264
same as hevc.

Signed-off-by: Dikshita Agarwal <dikshita@codeaurora.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-14 13:51:55 +01:00
Dikshita Agarwal
99d0cbe4be media: v4l2-ctrl: Add frame-specific min/max qp controls for hevc
- Adds min/max qp controls for B frame for h264.
- Adds min/max qp controls for I/P/B frames for hevc similar to h264.
- Update valid range of min/max qp for hevc to accommodate 10 bit.

Signed-off-by: Dikshita Agarwal <dikshita@codeaurora.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-14 13:51:18 +01:00
Mark Brown
e4aad9998e Linux 5.11-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl/7gQoeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG2UUH/0Yxo20aF/bv+580
 tP/Itt5R7nN5xXNgmpZ1kVOaAjcn1qg2lf5az0vHjjwUaWGvWyXdHmT7d6BObuFe
 S7dHUpIsAswfMmSL2Vfnj1brsRGHJqBVP82QJT/XXH5sgY4hjXFCQbAlJ/H0vUR1
 6TawXUrugbgQQxVNjQyfFPxHBvyx1VJ0dGL1aM/XsLLEXUITTDALrVg6T0wyQQ0B
 DBlSKMQxCViDUf0NjzAuXWKMmqnmoQnyRJseb//r9TgCIyDF+lIc1tzLJ0iFGKQl
 XMCvTXT+Owvc2M7a5eV44a18VtvXsmOKDXVKmHqqPcqRrJGX+zXqhsnUJNVEiVrS
 WbZ+Ef8=
 =oNTQ
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl//NHgACgkQJNaLcl1U
 h9BJXAf/TetLqgmpxge0NZJ0zt9y0dCtczA/ZyBY4rOXJp9losrD8swpOW+GnhdX
 MoT3UExH81RDdpqS9FQ0QHgY0fy1lSUf7GJWAa4TZmL+Ep/90robaTqiWbcK7Urc
 YcVrO0C0ypbFWqfaaEXiDrhi+GYJjs+rH26zIVLWVZwvrJka9OC2gURKb2kML+Uc
 5PvmEQm+Z4dY9etQ4nHKJCin+9FmSAxx+wQxp/6V6A0dUrm7fDgoqR8NQbWG18AM
 EhX4fg9YmlRT0oT7WZx5paI3bJ+mH/6ix/qxsZKY2MSShaKxaIkuTH8zJT7XtauT
 fPkaiziWDQDi3mu4/qlFBv0iQOx/0A==
 =QVPN
 -----END PGP SIGNATURE-----

Merge v5.11-rc3
2021-01-13 17:57:11 +00:00
Brendan Jackman
c6458e72f6 bpf: Clarify return value of probe str helpers
When the buffer is too small to contain the input string, these helpers
return the length of the buffer, not the length of the original string.
This tries to make the docs totally clear about that, since "the length
of the [copied ]string" could also refer to the length of the input.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112123422.2011234-1-jackmanb@google.com
2021-01-12 21:38:34 +01:00
Sakari Ailus
7c0ed600f0 media: v4l: uapi: ccs: Add CCS controls for shading correction
Add V4L2 controls for controlling CCS lens shading correction as well as
conveying its capabilities.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-12 17:31:14 +01:00
Sakari Ailus
a75210a62b media: v4l: uapi: ccs: Add controls for CCS alternative analogue gain
Add two new controls for alternative analogue gain some CCS compliant
camera sensors support.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-12 17:28:11 +01:00
Sakari Ailus
a8a2d75b08 media: v4l: uapi: ccs: Add controls for analogue gain constants
Add V4L2 controls for analogue gain constants required to control
analogue gain. The values are device specific and thus need to be obtained
from the driver.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-01-12 17:27:09 +01:00