Commit graph

982081 commits

Author SHA1 Message Date
Ioana Ciornei
d2624e70a2 dpaa2-eth: select XGMAC_MDIO for MDIO bus support
Explicitly enable the FSL_XGMAC_MDIO Kconfig option in order to have
MDIO access to internal and external PHYs.

Fixes: 7194792308 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201119145106.712761-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:21:54 -08:00
Jakub Kicinski
3cd336c517 Merge branch 'mlxsw-add-support-for-nexthop-objects'
Ido Schimmel says:

====================
mlxsw: Add support for nexthop objects

This patch set adds support for nexthop objects in mlxsw. Nexthop
objects are treated as another front-end for programming nexthops, in
addition to the existing IPv4 and IPv6 front-ends.

Patch #1 registers a listener to the nexthop notification chain and
parses the nexthop information into the existing mlxsw data structures
that are already used by the IPv4 and IPv6 front-ends. Blackhole
nexthops are currently rejected. Support will be added in a follow-up
patch set.

Patch #2 extends mlxsw to resolve its internal nexthop objects from the
nexthop identifier encoded in the FIB info of the notified routes.

Patch #3 finally removes the limitation of rejecting routes that use
nexthop objects.

Patch #4 adds a selftest.

Patches #5-#8 add generic forwarding selftests that can be used with
veth pairs or physical loopbacks.
====================

Link: https://lore.kernel.org/r/20201119130848.407918-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:23 -08:00
Ido Schimmel
e035146d65 selftests: forwarding: Add multipath tunneling nexthop test
Add a nexthop objects version of gre_multipath.sh. Unlike the original
test, it also tests IPv6 overlay which is not possible with the legacy
nexthop implementation. See commit 9a2ad36238 ("selftests: forwarding:
gre_multipath: Drop IPv6 tests") for more info.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:21 -08:00
Ido Schimmel
e96fa54bbd selftests: forwarding: Add device-only nexthop test
In a similar fashion to router_multipath.sh and its nexthop objects
version router_mpath_nh.sh, create a nexthop objects version of
router.sh.

It reuses the same topology, but uses device-only nexthop objects
instead of legacy ones.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
3600f29ad1 selftests: forwarding: Test IPv4 routes with IPv6 link-local nexthops
In addition to IPv4 multipath tests with IPv4 nexthops, also test IPv4
multipath with nexthops that use IPv6 link-local addresses.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
ffb721515b selftests: forwarding: Do not configure nexthop objects twice
routing_nh_obj() is used to configure the nexthop objects employed by
the test, but it is called twice resulting in "RTNETLINK answers: File
exists" messages.

Remove the first call, so that the function is only called after
setup_wait(), when all the interfaces are up and ready.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
20ac8f8690 selftests: mlxsw: Add nexthop objects configuration tests
Test that unsupported nexthop objects are rejected and that offload
indication is correctly set on: nexthop objects, nexthop group objects
and routes associated these objects.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
cdd6cfc54c mlxsw: spectrum_router: Allow programming routes with nexthop objects
Now that the driver supports nexthop objects, the check is no longer
necessary. Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
c25db3a77f mlxsw: spectrum_router: Enable resolution of nexthop groups from nexthop objects
If the FIB info (i.e, 'struct fib_info', 'struct fib6_info') uses a
nexthop object, then use the object's identifier to resolve the nexthop
group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Ido Schimmel
2a014b200b mlxsw: spectrum_router: Add support for nexthop objects
Register a listener to the nexthop notification chain and parse notified
nexthop objects into the existing mlxsw nexthop data structures.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:20:20 -08:00
Christian Eggers
30abc9cd9c net: dsa: avoid potential use-after-free error
If dsa_switch_ops::port_txtstamp() returns false, clone will be freed
immediately. Shouldn't store a pointer to freed memory.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 15:02:50 -08:00
Stafford Horne
d8398bf840 openrisc: add local64.h to fix blk-iocost build
As of 5.10 OpenRISC allyesconfig builds fail with the following error.

    $ make ARCH=openrisc CROSS_COMPILE=or1k-elf- block/blk-iocost.o
      CALL    scripts/checksyscalls.sh
      CALL    scripts/atomic/check-atomics.sh
      CC      block/blk-iocost.o
    block/blk-iocost.c:183:10: fatal error: asm/local64.h: No such file or directory
      183 | #include <asm/local64.h>
	  |          ^~~~~~~~~~~~~~~
    compilation terminated.

The new include of local64.h was added in commit 5e124f7432
("blk-iocost: use local[64]_t for percpu stat") by Tejun.

Adding the generic version of local64.h to OpenRISC fixes the build
issue.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2020-11-21 07:13:07 +09:00
Zhenzhong Duan
1f40704bb0 PCI: Move pci_match_device() ahead of new_id_store()
Move pci_match_device() and its dependencies (pci_match_id() and
pci_device_id_any) ahead of new_id_store().

This is preparation work for calling pci_match_device() in new_id_store().
No functional changes.

[bhelgaas: update function comments]
Link: https://lore.kernel.org/r/20201117054409.3428-2-zhenzhong.duan@gmail.com
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-11-20 15:51:35 -06:00
Rob Herring
3e95dfb315 dt-bindings: Don't error out on yamllint and dt-doc-validate errors
A broken schema file now causes make to exit and 'make -k' no longer works
now that dt-doc-validate is called from a single make rule.

As yamllint is optional, we shouldn't stop on yamllint errors either. Also,
it seems some old versions of yamllint don't work.

Signed-off-by: Rob Herring <robh@kernel.org>
2020-11-20 15:37:55 -06:00
Jakub Kicinski
2ed03e5a84 Merge branch 'netdevsim-add-ethtool-coalesce-and-ring-settings'
Antonio Cardace says:

====================
netdevsim: add ethtool coalesce and ring settings

Output of ethtool-ring.sh and ethtool-coalesce.sh selftests:

  # ./ethtool-ring.sh
  PASSED all 4 checks
  # ./ethtool-coalesce.sh
  PASSED all 22 checks
  # ./ethtool-pause.sh
  PASSED all 7 checks
====================

Link: https://lore.kernel.org/r/20201118204522.5660-1-acardace@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:54:06 -08:00
Antonio Cardace
fbb7a1f813 selftests: add ring and coalesce selftests
Add scripts to test ring and coalesce settings
of netdevsim.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
9e48ee80ac selftests: refactor get_netdev_name function
As pointed out by Michal Kubecek, getting the name
with the previous approach was racy, it's better
and easier to get the name of the device with this
patch's approach.

Essentialy the function doesn't need to exist
anymore as it's a simple 'ls' command.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
fbb8531e58 selftests: extract common functions in ethtool-common.sh
Factor out some useful functions so that they can be reused
by other ethtool-netdevsim scripts.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
a7fc6db099 netdevsim: support ethtool ring and coalesce settings
Add ethtool ring and coalesce settings support for testing.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
77f9591b21 netdevsim: move ethtool pause params in separate struct
This will help the refactoring in the next commit
when coalesce and ring settings are added.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
4ae21993f0 ethtool: add ETHTOOL_COALESCE_ALL_PARAMS define
This bitmask represents all existing coalesce parameters.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:53 -08:00
Jack Wang
d024f27de1 RDMA/ipoib: Distribute cq completion vector better
Currently ipoib choose cq completion vector based on port number, when HCA
only have one port, all the interface recv queue completion are bind to cq
completion vector 0.

To better distribute the load, use same method as __ib_alloc_cq_any to
choose completion vector, with the change, each interface now use
different completion vectors.

Link: https://lore.kernel.org/r/20201013074342.15867-1-jinpu.wang@cloud.ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-20 16:18:59 -04:00
Randy Dunlap
fc9840fbef net: stream: fix TCP references when INET is not enabled
Fix build of net/core/stream.o when CONFIG_INET is not enabled.
Fixes these build errors (sample):

ld: net/core/stream.o: in function `sk_stream_write_space':
(.text+0x27e): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x29c): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x2ab): undefined reference to `tcp_stream_memory_free'
ld: net/core/stream.o: in function `sk_stream_wait_memory':
(.text+0x5a1): undefined reference to `tcp_stream_memory_free'
ld: (.text+0x5bf): undefined reference to `tcp_stream_memory_free'

Fixes: 1c5f2ced13 ("tcp: avoid indirect call to tcp_stream_memory_free()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201118194438.674-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:13:55 -08:00
Linus Torvalds
4fd84bc969 block-5.10-2020-11-20
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+4C/oQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnlQD/40VMzxToudShz6bHyYYaJnA9TDEklQ/aUM
 rW5dyUthc7jap7bv5Yy6hRcoqXCjy77+3hBZ536q9XyX+jFZ/q34XqwHBOJ4tBwZ
 teBACT+sLxB3n41B/AecSvcG40GfINHyIONIeWc4JQmB5IqX4eX3PMIp+rsN+6yx
 IILnkZg9XULZ/IS48GBo8lgxaJHTU0qcOa20YZ3F9yY+/EsROwcy4ClyM+VUcoXe
 3jmXpYm/MucH4YIFPv0BhQI2TVQYzo2VK11YF6JgcDAJk15NNRjglCGesUwSfreZ
 45KdeXmReS5zKeJ9kSna1ifAwT7C7ag2EjHXUaDIhnqSKoClCuevv8cXb+0sk88X
 T3fifR0JKCrVcn2z8QTAVUJU9mq7+fTsj7HdwV/yuhonpiu61p1NlgThLQmamA9N
 Cm2yam/VdND6w7i5ZbR4pxDo7YAWg48WRZ5YOYPcHsXhqv6sxdrgxyX0ufK8CJaz
 ZDxcFKpG8LTjpKFhz/A5fDp06CdYWngCjE4+3ATFhNTo04pSNJ/3warPR+4n0Lb8
 KEMe+lgrwX9nQY65XPFFO5S/K1upzOmFpJQ6R6+NYJUS94IE/5pNUwSCa/kX07x4
 nniUUq6WkOR276K427iZVChUp2yc4bGTMJrZHd4Rs9OqD6IF7qhJM2G1YGrgPup6
 8pb9tB/5Tw==
 =v5pS
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph:
      - Doorbell Buffer freeing fix (Minwoo Im)
      - CSE log leak fix (Keith Busch)

 - blk-cgroup hd_struct leak fix (Christoph)

 - Flush request state fix (Ming)

 - dasd NULL deref fix (Stefan)

* tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  s390/dasd: fix null pointer dereference for ERP requests
  blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
  nvme: fix memory leak freeing command effects
  nvme: directly cache command effects log
  nvme: free sq/cq dbbuf pointers when dbbuf set fails
  block: mark flush request as IDLE when it is really finished
2020-11-20 12:03:40 -08:00
Linus Torvalds
fa5fca78bb io_uring-5.10-2020-11-20
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+4DAwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphdOD/9xOEnYPuekvVH9G9nyNd//Q9fPArG2+j6V
 /MCnze07GNtDt7z15oR+T07hKXmf+Ejh4nu3JJ6MUNfe/47hhJqHSxRHU6+PJCjk
 hPrsaTsDedxxLEDiLmvhXnUPzfVzJtefxVAAaKikWOb3SBqLdh7xTFSlor1HbRBl
 Zk4d343cjBDYfvSSt/zMWDzwwvramdz7rJnnPMKXITu64ITL5314vuK2YVZmBOet
 YujSah7J8FL1jKhiG1Iw5rayd2Q3smnHWIEQ+lvW6WiTvMJMLOxif2xNF4/VEZs1
 CBGJUQt42LI6QGEzRBHohcefZFuPGoxnduSzHCOIhh7d6+k+y9mZfsPGohr3g9Ov
 NotXpVonnA7GbRqzo1+IfBRve7iRONdZ3/LBwyRmqav4I4jX68wXBNH5IDpVR0Sn
 c31avxa/ZL7iLIBx32enp0/r3mqNTQotEleSLUdyJQXAZTyG2INRhjLLXTqSQ5BX
 oVp0fZzKCwsr6HCPZpXZ/f2G7dhzuF0ghoceC02GsOVooni22gdVnQj+AWNus398
 e+wcimT4MX6AHNFxO2aUtJow0KWWZRzC1p5Mxu/9W3YiMtJiC0YOGePfSqiTqX0g
 Uk0H5dOAgBUQrAsusf7bKr0K6W25yEk/JipxhWqi0rC71x42mLTsCT1wxSCvLwqs
 WxhdtVKroQ==
 =7PAe
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Mostly regression or stable fodder:

   - Disallow async path resolution of /proc/self

   - Tighten constraints for segmented async buffered reads

   - Fix double completion for a retry error case

   - Fix for fixed file life times (Pavel)"

* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  io_uring: order refnode recycling
  io_uring: get an active ref_node from files_data
  io_uring: don't double complete failed reissue request
  mm: never attempt async page lock if we've transferred data already
  io_uring: handle -EOPNOTSUPP on path resolution
  proc: don't allow async path resolution of /proc/self components
2020-11-20 11:47:22 -08:00
Kees Cook
7ef95e3dbc Merge branch 'for-linus/seccomp' into for-next/seccomp 2020-11-20 11:39:39 -08:00
Jann Horn
fab686eb03 seccomp: Remove bogus __user annotations
Buffers that are passed to read_actions_logged() and write_actions_logged()
are in kernel memory; the sysctl core takes care of copying from/to
userspace.

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Reviewed-by: Tyler Hicks <code@tyhicks.com>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201120170545.1419332-1-jannh@google.com
2020-11-20 11:39:21 -08:00
Song Liu
91b2db27d3 bpf: Simplify task_file_seq_get_next()
Simplify task_file_seq_get_next() by removing two in/out arguments: task
and fstruct. Use info->task and info->files instead.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201120002833.2481110-1-songliubraving@fb.com
2020-11-20 20:36:34 +01:00
Kevin Hilman
5bc0d7561a Amlogic fixes for v5.10-rc1
- misc DT only fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEe4dGDhaSf6n1v/EMWTcYmtP7xmUFAl+XNX4ACgkQWTcYmtP7
 xmVoHw/9FkmZXDbAfr8vZzD+4OGAmMABwzTDzbn2/1eBBXi4yKJLYnX7IwDYpdiG
 qL7sHBjauCCJXAuDvnmPyTaXBVmHlyrwnsjs3zDGCJ93GzlzewCmJCsam3KCw9E7
 CFKJEYZmijcBLHbm98Ia98dUj2NeYnmO7kZP402C/gKHZusC2A6q9PPLVsTJ3asS
 hRZqN2DfSGytkWHBIlwaEg8j67nWKwY2dqXVF3nx5+PXakS4KbNWeVXPy2/2xE1Q
 W18AsHvDp/3S1/PcxCPiIZeDkOgOdS7eBQruFFm4oZtAMGRrIhg6YH6qIuHkuRqY
 qXc7ztoZaLmulsCgdSUqPX0RDztQLrXUuvP8xnbY/NmMtAhxpmAOpgEvqEEiBOq8
 Z1t1RQetrIO3ktohqq+UrSRgyzcoGQgy+ghFH6g+Wd6gqixc9/tdFSvKtGNf9CsB
 YajmrsBWHmZKeuEBASgNUgBVIrMM2uAtcOwGbzfuGLDqkA077MJW/iLfNTt7t6Lx
 zE5hmp06NV/P5Uiv9BU/sbSfEwUVPou+XFZ/0IZ4G6pLgB8wkL53gZm3QoghQBcZ
 9tT0I/RA1s2hOVYevXbSzAKOquD9FWTSHXGiy58sL7zP9murxbUkeA/2nOmR0ZM5
 iD+G9+rXJKdNl+k4Il51Xbm1UliuCln/VT9BZj6A3cv6Wk71dUo=
 =Z+jV
 -----END PGP SIGNATURE-----

Merge tag 'amlogic-fixes' into v5.11/dt64

Amlogic fixes for v5.10-rc1
- misc DT only fixes
2020-11-20 11:34:10 -08:00
Gustavo A. R. Silva
3371c6f9f4
ASoC: codecs: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through, and also add fallthrough pseudo-keywords
in places where the code is intended to fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/d17b4d8300dbb6aff0d055b06b487c96ca264757.1605896059.git.gustavoars@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-11-20 19:28:34 +00:00
YiFei Zhu
0d8315dddd seccomp/cache: Report cache data through /proc/pid/seccomp_cache
Currently the kernel does not provide an infrastructure to translate
architecture numbers to a human-readable name. Translating syscall
numbers to syscall names is possible through FTRACE_SYSCALL
infrastructure but it does not provide support for compat syscalls.

This will create a file for each PID as /proc/pid/seccomp_cache.
The file will be empty when no seccomp filters are loaded, or be
in the format of:
<arch name> <decimal syscall number> <ALLOW | FILTER>
where ALLOW means the cache is guaranteed to allow the syscall,
and filter means the cache will pass the syscall to the BPF filter.

For the docker default profile on x86_64 it looks like:
x86_64 0 ALLOW
x86_64 1 ALLOW
x86_64 2 ALLOW
x86_64 3 ALLOW
[...]
x86_64 132 ALLOW
x86_64 133 ALLOW
x86_64 134 FILTER
x86_64 135 FILTER
x86_64 136 FILTER
x86_64 137 ALLOW
x86_64 138 ALLOW
x86_64 139 FILTER
x86_64 140 ALLOW
x86_64 141 ALLOW
[...]

This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default
of N because I think certain users of seccomp might not want the
application to know which syscalls are definitely usable. For
the same reason, it is also guarded by CAP_SYS_ADMIN.

Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
445247b023 xtensa: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for xtensa.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/79669648ba167d668ea6ffb4884250abcd5ed254.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
4c18bc054b sh: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for sh.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/61ae084cd4783b9b50860d9dedb4a348cf1b7b6f.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
c09058eda2 s390: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for s390.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/a381b10aa2c5b1e583642f3cd46ced842d9d4ce5.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
673a11a7e4 riscv: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for riscv.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/58ef925d00505cbb77478fa6bd2b48ab2d902460.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
e7bcb4622d powerpc: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for powerpc.

__LITTLE_ENDIAN__ is used here instead of CONFIG_CPU_LITTLE_ENDIAN
to keep it consistent with asm/syscall.h.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/0b64925362671cdaa26d01bfe50b3ba5e164adfd.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:35 -08:00
YiFei Zhu
6aa7923c87 parisc: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for parisc.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/9bb86c546eda753adf5270425e7353202dbce87c.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
YiFei Zhu
6e9ae6f988 csky: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for csky.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/f9219026d4803b22f3e57e3768b4e42e004ef236.1605101222.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
Kees Cook
424c9102fa arm: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for arm.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-11-20 11:16:34 -08:00
Kees Cook
ffde703470 arm64: Enable seccomp architecture tracking
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for arm64.

Signed-off-by: Kees Cook <keescook@chromium.org>
2020-11-20 11:16:34 -08:00
Kees Cook
192cf32243 selftests/seccomp: Compare bitmap vs filter overhead
As part of the seccomp benchmarking, include the expectations with
regard to the timing behavior of the constant action bitmaps, and report
inconsistencies better.

Example output with constant action bitmaps on x86:

$ sudo ./seccomp_benchmark 100000000
Current BPF sysctl settings:
net.core.bpf_jit_enable = 1
net.core.bpf_jit_harden = 0
Benchmarking 200000000 syscalls...
129.359381409 - 0.008724424 = 129350656985 (129.4s)
getpid native: 646 ns
264.385890006 - 129.360453229 = 135025436777 (135.0s)
getpid RET_ALLOW 1 filter (bitmap): 675 ns
399.400511893 - 264.387045901 = 135013465992 (135.0s)
getpid RET_ALLOW 2 filters (bitmap): 675 ns
545.872866260 - 399.401718327 = 146471147933 (146.5s)
getpid RET_ALLOW 3 filters (full): 732 ns
696.337101319 - 545.874097681 = 150463003638 (150.5s)
getpid RET_ALLOW 4 filters (full): 752 ns
Estimated total seccomp overhead for 1 bitmapped filter: 29 ns
Estimated total seccomp overhead for 2 bitmapped filters: 29 ns
Estimated total seccomp overhead for 3 full filters: 86 ns
Estimated total seccomp overhead for 4 full filters: 106 ns
Estimated seccomp entry overhead: 29 ns
Estimated seccomp per-filter overhead (last 2 diff): 20 ns
Estimated seccomp per-filter overhead (filters / 4): 19 ns
Expectations:
	native ≤ 1 bitmap (646 ≤ 675): ✔️
	native ≤ 1 filter (646 ≤ 732): ✔️
	per-filter (last 2 diff) ≈ per-filter (filters / 4) (20 ≈ 19): ✔️
	1 bitmapped ≈ 2 bitmapped (29 ≈ 29): ✔️
	entry ≈ 1 bitmapped (29 ≈ 29): ✔️
	entry ≈ 2 bitmapped (29 ≈ 29): ✔️
	native + entry + (per filter * 4) ≈ 4 filters total (755 ≈ 752): ✔️

[YiFei: Changed commit message to show stats for this patch series]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/1b61df3db85c5f7f1b9202722c45e7b39df73ef2.1602431034.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
Kees Cook
25db91209a x86: Enable seccomp architecture tracking
Provide seccomp internals with the details to calculate which syscall
table the running kernel is expecting to deal with. This allows for
efficient architecture pinning and paves the way for constant-action
bitmaps.

Co-developed-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/da58c3733d95c4f2115dd94225dfbe2573ba4d87.1602431034.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
YiFei Zhu
8e01b51a31 seccomp/cache: Add "emulator" to check if filter is constant allow
SECCOMP_CACHE will only operate on syscalls that do not access
any syscall arguments or instruction pointer. To facilitate
this we need a static analyser to know whether a filter will
return allow regardless of syscall arguments for a given
architecture number / syscall number pair. This is implemented
here with a pseudo-emulator, and stored in a per-filter bitmap.

In order to build this bitmap at filter attach time, each filter is
emulated for every syscall (under each possible architecture), and
checked for any accesses of struct seccomp_data that are not the "arch"
nor "nr" (syscall) members. If only "arch" and "nr" are examined, and
the program returns allow, then we can be sure that the filter must
return allow independent from syscall arguments.

Nearly all seccomp filters are built from these cBPF instructions:

BPF_LD  | BPF_W    | BPF_ABS
BPF_JMP | BPF_JEQ  | BPF_K
BPF_JMP | BPF_JGE  | BPF_K
BPF_JMP | BPF_JGT  | BPF_K
BPF_JMP | BPF_JSET | BPF_K
BPF_JMP | BPF_JA
BPF_RET | BPF_K
BPF_ALU | BPF_AND  | BPF_K

Each of these instructions are emulated. Any weirdness or loading
from a syscall argument will cause the emulator to bail.

The emulation is also halted if it reaches a return. In that case,
if it returns an SECCOMP_RET_ALLOW, the syscall is marked as good.

Emulator structure and comments are from Kees [1] and Jann [2].

Emulation is done at attach time. If a filter depends on more
filters, and if the dependee does not guarantee to allow the
syscall, then we skip the emulation of this syscall.

[1] https://lore.kernel.org/lkml/20200923232923.3142503-5-keescook@chromium.org/
[2] https://lore.kernel.org/lkml/CAG48ez1p=dR_2ikKq=xVxkoGg0fYpTBpkhJSv1w-6BG=76PAvw@mail.gmail.com/

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/71c7be2db5ee08905f41c3be5c1ad6e2601ce88f.1602431034.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
YiFei Zhu
f9d480b6ff seccomp/cache: Lookup syscall allowlist bitmap for fast path
The overhead of running Seccomp filters has been part of some past
discussions [1][2][3]. Oftentimes, the filters have a large number
of instructions that check syscall numbers one by one and jump based
on that. Some users chain BPF filters which further enlarge the
overhead. A recent work [6] comprehensively measures the Seccomp
overhead and shows that the overhead is non-negligible and has a
non-trivial impact on application performance.

We observed some common filters, such as docker's [4] or
systemd's [5], will make most decisions based only on the syscall
numbers, and as past discussions considered, a bitmap where each bit
represents a syscall makes most sense for these filters.

The fast (common) path for seccomp should be that the filter permits
the syscall to pass through, and failing seccomp is expected to be
an exceptional case; it is not expected for userspace to call a
denylisted syscall over and over.

When it can be concluded that an allow must occur for the given
architecture and syscall pair (this determination is introduced in
the next commit), seccomp will immediately allow the syscall,
bypassing further BPF execution.

Each architecture number has its own bitmap. The architecture
number in seccomp_data is checked against the defined architecture
number constant before proceeding to test the bit against the
bitmap with the syscall number as the index of the bit in the
bitmap, and if the bit is set, seccomp returns allow. The bitmaps
are all clear in this patch and will be initialized in the next
commit.

When only one architecture exists, the check against architecture
number is skipped, suggested by Kees Cook [7].

[1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/
[2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/
[3] https://github.com/seccomp/libseccomp/issues/116
[4] ae0ef82b90/profiles/seccomp/default.json
[5] 6743a1caf4/src/shared/seccomp-util.c (L270)
[6] Draco: Architectural and Operating System Support for System Call Security
    https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020
[7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/

Co-developed-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu>
Signed-off-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu>
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu
2020-11-20 11:16:34 -08:00
Colin Ian King
7648398017 octeontx2-af: Fix access of iter->entry after iter object has been kfree'd
The call to pc_delete_flow can kfree the iter object, so the following
dev_err message that accesses iter->entry can accessmemory that has
just been kfree'd.  Fix this by adding a temporary variable 'entry'
that has a copy of iter->entry and also use this when indexing into
the array mcam->entry2target_pffunc[]. Also print the unsigned value
using the %u format specifier rather than %d.

Addresses-Coverity: ("Read from pointer after free")
Fixes: 55307fcb92 ("octeontx2-af: Add mbox messages to install and delete MCAM rules")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118143803.463297-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 11:07:57 -08:00
Colin Ian King
dd6028a3cb octeontx2-af: Fix return of uninitialized variable err
Currently the variable err may be uninitialized if several of the if
statements are not executed in function nix_tx_vtag_decfg and a garbage
value in err is returned.  Fix this by initialized ret at the start of
the function.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 9a946def26 ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118132502.461098-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 11:06:10 -08:00
Colin Ian King
583b273dea octeontx2-pf: Fix unintentional sign extension issue
The shifting of the u16 result from ntohs(proto) by 16 bits to the
left will be promoted to a 32 bit signed int and then sign-extended
to a u64.  In the event that the top bit of the return from ntohs(proto)
is set then all then all the upper 32 bits of a 64 bit long end up as
also being set because of the sign-extension. Fix this by casting to
a u64 long before the shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: f0c2982aaf ("octeontx2-pf: Add support for SR-IOV management function")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201118130520.460365-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 11:05:16 -08:00
Raju Rangoju
bff453921a cxgb4: fix the panic caused by non smac rewrite
SMT entry is allocated only when loopback Source MAC
rewriting is requested. Accessing SMT entry for non
smac rewrite cases results in kernel panic.

Fix the panic caused by non smac rewrite

Fixes: 937d842058 ("cxgb4: set up filter action after rewrites")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201118143213.13319-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 11:03:37 -08:00
Kees Cook
4c222f31fb selftests/seccomp: sh: Fix register names
It looks like the seccomp selftests was never actually built for sh.
This fixes it, though I don't have an environment to do a runtime test
of it yet.

Fixes: 0bb605c2c7 ("sh: Add SECCOMP_FILTER")
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/lkml/a36d7b48-6598-1642-e403-0c77a86f416d@physik.fu-berlin.de
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-11-20 11:03:08 -08:00
Kees Cook
f5098e34dd selftests/seccomp: powerpc: Fix typo in macro variable name
A typo sneaked into the powerpc selftest. Fix the name so it builds again.

Fixes: 46138329fa ("selftests/seccomp: powerpc: Fix seccomp return value testing")
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/87y2ix2895.fsf@mpe.ellerman.id.au
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-11-20 11:02:28 -08:00