Commit graph

75139 commits

Author SHA1 Message Date
Linus Walleij
36eccdb58f gpio updates for v5.10 - part 1
- automatically drive GPHY leds in gpio-stp-xway
 - refactor ->{get, set}_multiple() in gpio-aggregator
 - add support for a new model in rcar-gpio DT bindings
 - simplify several GPIO drivers with dev_err_probe()
 - disable Direct KBD interrupts in gpio-tc35894
 - use DEFINE_SEQ_ATTRIBUTE() in GPIO chardev to shrink code
 - switch to using a simpler IDA API in gpiolib
 - make devprop_gpiochip_set_names() more generic by using device properties
   instead of using fwnode helpers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAl9jSlsACgkQEacuoBRx
 13K6cBAA2Drc9AwFpGzb+yE7bvKmxtbCbD8RZBQNg4Ljy56yQOQhNnQSE91YIeGj
 MlSNbn89nKGlkUje2qggJn6+ZE1InKI70d67fK4z/MiMDv31YLPN78Qei9kbTXgF
 dNiV6wnabavAJrpNPtNOU8SLotn9C3zVwTyXOuMqfGL6TqAWf79Xk2QHuV1l/Apz
 CozfTI/81p1eeuF3yMRz+ubGS/2Zpo862QmgA9CMWb439z7LNbQJ1vjFKMqvkW0P
 ey9ELjV/y5PFTA9fWYIukOjBqpsWyihdIW2K3R/7yR8OFqREUnvBzJZJLpghCkzO
 vpivVFlZu7eyMpdUU4ZH7ussFVX+GqYbV8Q42shabClQH7qJAfnO3nhPptWWWd/v
 CmDOEvsfg2CpXpWJmwxUxoUJEOto2NO3HOBM41XYKQhSs3pw1HheN2wxVOWWOrTa
 kJfJnyaWYT1qkolRuXMdLkauZHOpmAP3o+o+OtInd+NeM9lyaCjhdtQ11vBDk1Oc
 3jLskr29KyTIa9lNPLT+pcQWuF77dhtzTPUIeG1+BzWA/FBpiR2c0M2tmUdRGIKb
 IEPF0vkihiLy5rz/j1YYgADLDc6uxhEbiP/WKcbKySgc4kCQCHkgFQwa/DyMAROC
 ye7Yof9m6Yx1KMvioP0QpP/LU9mfooSoa2JU2iHGH8df+VJ1NFQ=
 =RbA2
 -----END PGP SIGNATURE-----

Merge tag 'gpio-updates-for-v5.10-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into devel

gpio updates for v5.10 - part 1

- automatically drive GPHY leds in gpio-stp-xway
- refactor ->{get, set}_multiple() in gpio-aggregator
- add support for a new model in rcar-gpio DT bindings
- simplify several GPIO drivers with dev_err_probe()
- disable Direct KBD interrupts in gpio-tc35894
- use DEFINE_SEQ_ATTRIBUTE() in GPIO chardev to shrink code
- switch to using a simpler IDA API in gpiolib
- make devprop_gpiochip_set_names() more generic by using device properties
  instead of using fwnode helpers
2020-09-21 23:39:50 +02:00
Matthew Rosato
12856e7acd PCI/IOV: Mark VFs as not implementing PCI_COMMAND_MEMORY
For VFs, the Memory Space Enable bit in the Command Register is
hard-wired to 0.

Add a new bit to signify devices where the Command Register Memory
Space Enable bit does not control the device's response to MMIO
accesses.

Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21 14:42:11 -06:00
Nuno Sá
e817316174 iio: adis. Drop adis_burst struct
As there are no users anymore of this structure, it can be safely
removed.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200917155223.218500-5-nuno.sa@analog.com
2020-09-21 20:01:45 +01:00
Nuno Sá
36e322ec5d iio: adis: Move burst mode into adis_data
Add burst mode variables in the per device specific data structure. As
some drivers support multiple devices with different burst sizes it
makes sense this data to be in `adis_data`. While moving the variables,
there are two main differences:

1. The `en`variable is dropped. If a device supports burst mode, it will
just use it as it will has better performance for almost all real use
cases.
2. Replace `extra_len` by `burst_len`. Users should now explicitly
define the length of the burst buffer as it is typically constant. This
also allows to remove the following line from the library:

```
/* All but the timestamp channel */
burst_length = (indio_dev->num_channels - 1) * sizeof(u16);
```

The library should not assume that a timestamp channel is defined.
Moreover, most parts also include some diagnostic data, crc, etc.. in
the burst buffer that needed to be included in an `extra_len` variable
which is not that nice. On top of this, some devices already start to
have some 32bit size channels ...

This patch is also a move to completely drop the `struct adis_burst`
from the library.

Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200917155223.218500-2-nuno.sa@analog.com
2020-09-21 20:01:45 +01:00
Jonathan Cameron
cd7798cbd2 iio: Add __printf() attributes to various allocation functions
A partial set of these was added to IIO a long time back.
This fills in some gaps in coverage highlighted by building
with W=1

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20200913132115.800131-3-jic23@kernel.org
2020-09-21 18:54:18 +01:00
Michał Mirosław
4c9db39361
regulator: unexport regulator_lock/unlock()
regulator_lock/unlock() was used only to guard
regulator_notifier_call_chain(). As no users remain, make the functions
internal.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Link: https://lore.kernel.org/r/d3381aabd2632aff5e7b839d55868bec6e85c811.1600550732.git.mirq-linux@rere.qmqm.pl
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-21 17:43:47 +01:00
Greg Kroah-Hartman
79d924e92f Merge ba31128384dfd ("Merge tag 'libnvdimm-fixes-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm") into tty-next
We need the dax build fix in here so that our builds do not keep
breaking.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-21 18:26:09 +02:00
Greg Kroah-Hartman
55be22adf1 Merge a31128384d ("Merge tag 'libnvdimm-fixes-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm") into usb-next
We need the dax build fix in here as well, so that the build does not
break.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-21 18:23:56 +02:00
Trond Myklebust
c754e137f5 pNFS/flexfiles: Be consistent about mirror index types
A mirror index is always of type u32.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 12:06:27 -04:00
Matthew Wilcox (Oracle)
81ee8e52a7 iomap: Change calling convention for zeroing
Pass the full length to iomap_zero() and dax_iomap_zero(), and have
them return how many bytes they actually handled.  This is preparatory
work for handling THP, although it looks like DAX could actually take
advantage of it if there's a larger contiguous area.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-21 08:59:27 -07:00
Matthew Wilcox (Oracle)
24addd848a fs: Introduce i_blocks_per_page
This helper is useful for both THPs and for supporting block size larger
than page size.  Convert all users that I could find (we have a few
different ways of writing this idiom, and I may have missed some).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2020-09-21 08:59:26 -07:00
Alexander A. Klimov
0bdd4cea12 Replace HTTP links with HTTPS ones: NFS, SUNRPC, and LOCKD clients
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:10 -04:00
Randy Dunlap
1138ce1cf6 sunrpc: fix duplicated word in <linux/sunrpc/cache.h>
Change "time time" to "time expiry_time" to match the field name.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:10 -04:00
Jan Kara
88b67edd72 dax: Fix compilation for CONFIG_DAX && !CONFIG_FS_DAX
dax_supported() is defined whenever CONFIG_DAX is enabled. So dummy
implementation should be defined only in !CONFIG_DAX case, not in
!CONFIG_FS_DAX case.

Fixes: e2ec512825 ("dm: Call proper helper to determine dax support")
Cc: <stable@vger.kernel.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-21 06:53:09 -07:00
Tian Tao
b1d4dc15b2 i2c: Switch to using the new API kobj_to_dev()
Switch to using the new API kobj_to_dev().

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-09-21 11:45:43 +02:00
Marc Kleine-Budde
728fc9ff73 can: rx-offload: can_rx_offload_add_manual(): add new initialization function
This patch adds a new initialization function:
can_rx_offload_add_manual()

It should be used to add support rx-offload to a driver, if the callback
mechanism should not be used. Use e.g. can_rx_offload_queue_sorted() to queue
skbs into rx-offload.

Link: https://lore.kernel.org/r/20200915223527.1417033-33-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-21 10:13:19 +02:00
Marc Kleine-Budde
80a71815d8 can: dev: can_put_echo_skb(): propagate error in case of errors
The function can_put_echo_skb() can fail for several reasons. It may
fail due to OOM, but when it fails it's usually due to locking problems
in the driver.

In order to help developing and debugging of new drivers propagate error
value in case of errors.

Link: https://lore.kernel.org/r/20200915223527.1417033-12-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-21 10:13:16 +02:00
Marc Kleine-Budde
49347755a8 can: include: fix spelling mistakes
This patch fixes spelling erros found by "codespell" in the
include/linux/can subtree.

Link: https://lore.kernel.org/r/20200915223527.1417033-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-09-21 10:13:15 +02:00
Greg Kroah-Hartman
33f16b25a0 Merge 5.9.0-rc6 into tty-next
We need the tty/serial fixes in here and this resolves a merge issue in
the 8250 driver.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-21 09:23:55 +02:00
Greg Kroah-Hartman
629b911153 Merge 5.0-rc6 into usb-next
We want the USB fixes in here, and this resolves a merge issue in the
uas driver.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-21 09:06:42 +02:00
Vladimir Oltean
bbed0bbddd net: dsa: tag_8021q: add VLANs to the master interface too
The whole purpose of tag_8021q is to send VLAN-tagged traffic to the
CPU, from which the driver can decode the source port and switch id.

Currently this only works if the VLAN filtering on the master is
disabled. Change that by explicitly adding code to tag_8021q.c to add
the VLANs corresponding to the tags to the filter of the master
interface.

Because we now need to call vlan_vid_add, then we also need to hold the
RTNL mutex. Propagate that requirement to the callers of dsa_8021q_setup
and modify the existing call sites as appropriate. Note that one call
path, sja1105_best_effort_vlan_filtering_set -> sja1105_vlan_filtering
-> sja1105_setup_8021q_tagging, was already holding this lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-20 19:01:34 -07:00
Linus Torvalds
3d491679b8 * Fix lockdep's detection of "USED" <- "IN-NMI" inversions, from Peter
Zijlstra.
 
 * Make percpu-rwsem operations on the semaphore's ->read_count IRQ-safe
   because it can be used in an IRQ context, from Hou Tao.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9nsLQACgkQEsHwGGHe
 VUo0+g/7B9JzDtRSgchT095VcD8w+YcZTyiJM58q9I9OZMxi1zdJZPyoQZ2xZjnG
 aczDJN5H6P6OcBm949EUCHhmDEDfoZpC7Y5FEHe9dJitPmC7rRilGJuz4Im8td3N
 DsLhpFe8KUSqRVyygjvjM393md8tw+m4Jq+syjWri4/1wj1Fs4jHdhKWgz6b2cup
 JXbfjgVOkHVOTloMHnmgdHOPvkh60/LoG8r/5gzLbD8Z/FTD3BSOCTgw+8L8EB+b
 yiWWuEcR5LDjy7sNY2xVhTB+nkHJXe4o+HVufZoAzd1j7FDLfDVSNaZHJVp2m7Gp
 W47xRrzIKIJl6DUp5E4TGi9aZvkO/h5kqSOZ4MhfaeEqw9DLW/KxbherJjzVY3bb
 Nt74N+N+WOq6riTVx5wJkDRmtT5RaeW1kKJUaSeGMl3fxCken5CKE1WyBbae0GiU
 kq3tCn4t0OKCWhuiixFNg6RxuXokjfzXDiHr9aUt3x5musc9P+5YUIdOMsaHwoAH
 0YrAZoRCPyE2ABUgKC8hoTc3Y+TIwji7c0B9S2GoHRTm1LZTRURlLOUQyJ4oU9Ez
 Vt5TILwre9lX9tTW7BlxlOCYkgRQn78vLBSOy1CEUTsORbdYnv/kCy5S5yHEILpT
 f3E345wzdd8zrXGQfX2jsVgNAx0F+81Fb9H9FMJSCFgjIya+SRM=
 =70QQ
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:
 "Two fixes from the locking/urgent pile:

   - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter
     Zijlstra)

   - Make percpu-rwsem operations on the semaphore's ->read_count
     IRQ-safe because it can be used in an IRQ context (Hou Tao)"

* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
  locking/lockdep: Fix "USED" <- "IN-NMI" inversions
2020-09-20 15:25:33 -07:00
Linus Torvalds
4a123dbaf3 libnvdimm fixes for 5.9-rc6
- Fix an original bug in device-mapper table reference counting when
   interrogating dax capability in the component device. This bug was
   hidden by the following bug.
 
 - Fix device-mapper to use the proper helper (dax_supported() instead of
   the leaf helper generic_fsdax_supported()) to determine dax operation
   of a stacked block device configuration. The original implementation
   is only valid for one level of dax-capable block device stacking. This
   bug was discovered while fixing the below regression.
 
 - Fix an infinite recursion regression introduced by broken attempts to
   quiet the generic_fsdax_supported() path and make it bail out before
   logging "dax capability not found" errors.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl9noTYACgkQHtKRamZ9
 iAJuNBAAkuj1C5287OfKLmJGnnPy+MPRaQEkFczsIwmno7qdv7IoUxFnbsd1vnlK
 5RlpsmFNu+NDGQ74cZZHFT8pSaOc/6sgchcmGNBdbAaV6uerHlsojMb6YmLASogF
 0WIi+0m8i5/zXXFUalAcWGfxNRYwVbfH/wdiu7+ZNvbvQ577nOHU4jgMX/nUbvQq
 dXF6T4BOFqSfXO0xPenCWP27SDBRWkztXQMdWh+mcayiIvXf/l9/Ir1jx0hpM5zb
 Bdllpn/BMcvqzKjGAbhGuRgEPQHWaRoaCSg0Evb60e3lNxh75JL6Wj1xRmkTt3HG
 B4YJ0Jkm0vENxi8dUvolX3s6nd/rdbtgzKQZbPh/+xVZQBzLm07VUMNKgD+mFGuQ
 OHj9njHuRGk+rMgjJFD5TBUkNZwK9s0y0iB+aeIo9WPcBwFZAir3NOEJ8kXoo6vZ
 LuvukOHmf8+7agPEZuhVymruP+Sc3c03SIHk8HJfn9DWUtD9IXUuvb0BIho2h0nE
 9PLYt95bQF4yxYP0N2BYxpVxcoByXcbwDzBCUIOnIbu11hLuQ/ZKfV4CP8cDj646
 ZTHA7rArJs5NyIoRBGs6rtkc2fSt1uG2nm65zNLvx0KV/rqhYFWX26ZX7u0WFrFa
 yhtx1hVk5/RRdU3p9e595rZDjXHK0YbK8XAMI2Dp702zvVgykN4=
 =dSAf
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A handful of fixes to address a string of mistakes in the mechanism
  for device-mapper to determine if its component devices are dax
  capable.

   - Fix an original bug in device-mapper table reference counting when
     interrogating dax capability in the component device. This bug was
     hidden by the following bug.

   - Fix device-mapper to use the proper helper (dax_supported() instead
     of the leaf helper generic_fsdax_supported()) to determine dax
     operation of a stacked block device configuration. The original
     implementation is only valid for one level of dax-capable block
     device stacking. This bug was discovered while fixing the below
     regression.

   - Fix an infinite recursion regression introduced by broken attempts
     to quiet the generic_fsdax_supported() path and make it bail out
     before logging "dax capability not found" errors"

* tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix stack overflow when mounting fsdax pmem device
  dm: Call proper helper to determine dax support
  dm/dax: Fix table reference counts
2020-09-20 15:01:57 -07:00
Linus Torvalds
f44f3f83d8 TTY/Serial/fbcon fixes for 5.9-rc6
Here are some small TTY/Serial and one more fbcon fix for 5.9-rc6
 
 They include:
 	- serial core locking regression fixes
 	- new device ids for 8250_pci driver
 	- fbcon fix for syzbot found issue
 
 All have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX2dYjA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynbqQCgvYtuyYtnmQcLCwq/SM3gRF1NlMoAn3MXqtea
 3cRoC8/q0tOp2SxgPnaY
 =YRtI
 -----END PGP SIGNATURE-----

Merge tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial/fbcon fixes from Greg KH:
 "Here are some small tty/serial and one more fbcon fix.

  They include:

   - serial core locking regression fixes

   - new device ids for 8250_pci driver

   - fbcon fix for syzbot found issue

  All have been in linux-next with no reported issues"

* tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  fbcon: Fix user font detection test at fbcon_resize().
  serial: 8250_pci: Add Realtek 816a and 816b
  serial: core: fix console port-lock regression
  serial: core: fix port-lock initialisation
2020-09-20 10:46:26 -07:00
Jan Kara
e2ec512825 dm: Call proper helper to determine dax support
DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d6 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-20 08:55:09 -07:00
Tobias Klauser
4773ef33fc stackleak: let stack_erasing_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50:    expected void *
kernel/stackleak.c:31:50:    got void [noderef] __user *buffer

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:39 -07:00
Tobias Klauser
7bb82ac30c ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:39 -07:00
Yangbo Lu
6d23d831e9 ptp_qoriq: support FIPER3
The FIPER3 (fixed interval period pulse generator) is supported on
DPAA2 and ENETC network controller hardware. This patch is to support
it in ptp_qoriq driver.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 17:49:20 -07:00
Mahesh Bandewar
3753d97790 net: fix build without CONFIG_SYSCTL definition
Earlier commit 316cdaa115 ("net: add option to not create fall-back
tunnels in root-ns as well") removed the CONFIG_SYSCTL to enable the
kernel-commandline to work. However, this variable gets defined only
when CONFIG_SYSCTL option is selected.

With this change the behavior would default to creating fall-back
tunnels in all namespaces when CONFIG_SYSCTL is not selected and
the kernel commandline option will be ignored.

Fixes: 316cdaa115 ("net: add option to not create fall-back tunnels in root-ns as well")
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 14:14:05 -07:00
Al Viro
6d1349c769 [PATCH] reduce boilerplate in fsid handling
Get rid of boilerplate in most of ->statfs()
instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-18 16:45:50 -04:00
Linus Torvalds
69828c475d - Allow CPUs affected by erratum 1418040 to come online late (previously
we only fixed the other case - CPUs not affected by the erratum coming
   up late).
 
 - Fix branch offset in BPF JIT.
 
 - Defer the stolen time initialisation to the CPU online time from the
   CPU starting time to avoid a (sleep-able) memory allocation in an
   atomic context.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl9k278ACgkQa9axLQDI
 XvHBsQ//Y5QpZifzLKM4b2Zcs2/lk67UPd/8rt3p8+WP+T+lQwIxWUxhQmctiYGI
 iNkLsCVWD3S27eCHWsAKK9h7tlaqnbmQdQuJbFMUF8IjQhx+D+vt14CkYzDnajcS
 Ob4uqZKCN2WAkMOaL9SEFEXWZh0v5n83PjlorrIz0MzZlNHbAxR53TXHoPcM6rnF
 xfVwYuk5+FaHKa9YfAIUAtaV5ncTyF6YIk5wqXW/hH1jvfjdF3CpQEog3Oamg8Ik
 BcH1MKzc2/7k/E916ppPke90ys6w7HdLGm7NaR5cbDwAStxYDqASiAplgsms28HJ
 8BbC0RY3FGCRnq40z/2DBBRGy2uP4iifw6gXWyUZnViO+ulbWRmcqvlpii43JDm4
 2tLqq225gBgsbKaiLAbnnVpv+VAkodG0MDUy5J4BLK+dxv3Z3UAj0qo3q9+4x/87
 Q07GTsAa/e0m0tPjVSwtmikQuJJc2uu7jk7loOLNsO8NfrU2/SJF8mEf0e6IP0JN
 ZceoReLvWArERkSQqjk4jSijqo+SfUTNQenjt+F5hciBB3dD0fC3Rgslbm96cmUw
 oSAvrIfKpNWbdZ3LUAQlPItkDKXb5aXkh1WIQ4gvt7OoCBVVCpwLLjqshZ39TaHA
 EafBjNqv4FvC030VOA3o/DibNJUY1baOOE8Cj1C9/ghA4p1pX1w=
 =dCba
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Allow CPUs affected by erratum 1418040 to come online late
   (previously we only fixed the other case - CPUs not affected by the
   erratum coming up late).

 - Fix branch offset in BPF JIT.

 - Defer the stolen time initialisation to the CPU online time from the
   CPU starting time to avoid a (sleep-able) memory allocation in an
   atomic context.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: paravirt: Initialize steal time when cpu is online
  arm64: bpf: Fix branch offset in JIT
  arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late
2020-09-18 11:55:43 -07:00
Linus Torvalds
794a9965ee Power management updates for 5.9-rc6
- Add support for the Lakefield chip to the RAPL power capping
    driver (Ricardo Neri).
 
  - Modify the ACPI processor idle driver to prevent it from triggering
    RCU-lockdep complaints which has started to happen after recent
    changes in that area (Peter Zijlstra).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl9k1mwSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxuAcQAIY8XeVHRUD37n6LwRbyia3XNDXo0WxD
 NWtRaSK6cRBCQVyHhILXwivJSLAsk0iVAMpuTzifb6AN9smnA7ZwhZhWEzwcKzXO
 fBNuUPcyqiJqQYekUdSxMDmSyXVYCmRKHcdOfI6R0Yl028prx87U/h4ukWjyRLOZ
 C/Ix7wTWxWvwcU0oo1FTjBZyFxTmMHzJXT9E9U4O69yR8yIWS67LCM30yAo8OQjx
 hxyg4z5hJG+Z4z1SOJmH9luZ9egBjopU6PvZBhbGP1+DNdstKyeaOHnbDv+iNz5g
 sofbQ28dWC1PykRy8YAnTz1VrjfYeiQUHTS3aci/v6pqbLMxrqKEpZ5K+815rpw3
 KKPaD4WSXN/bn03yv1OxUJG/2vwrY9sVnqwVSgOqm7c0aTfmD9f0Y8VO0xk1bgUy
 cwR3HzKb3Ig5xxQ7ultiEBV8oUKAu+SiG4gUGDmVxUR110N5jEZH9kqBw6nuAvNW
 BYUVMyH0KRVb2J2r3y8kRla9gLv/MvExDoKZc3NQ6VmmKDXVBwGRPn3LpTVZHlLj
 L3m2qQpTD+H+/ydGjxML7YNgW+07u7rjAxI6Qr2d2OgXyTFPXmKzI/wqWwuYI686
 IUTRSxVv5O+WA2rNjuFBGLFE4GnkkBROrUWfpgse5ZeTX61ZG9fX4mN325yba9Hk
 /MSuLChk+gZu
 =gBKv
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add a new CPU ID to the RAPL power capping driver and prevent
  the ACPI processor idle driver from triggering RCU-lockdep complaints.

  Specifics:

   - Add support for the Lakefield chip to the RAPL power capping driver
     (Ricardo Neri).

   - Modify the ACPI processor idle driver to prevent it from triggering
     RCU-lockdep complaints which has started to happen after recent
     changes in that area (Peter Zijlstra)"

* tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Take over RCU-idle for C3-BM idle
  cpuidle: Allow cpuidle drivers to take over RCU-idle
  ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
  ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP
  powercap: RAPL: Add support for Lakefield
2020-09-18 11:43:21 -07:00
Masami Hiramatsu
82d083ab60 kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
Since kprobe_event= cmdline option allows user to put kprobes on the
functions in initmem, kprobe has to make such probes gone after boot.
Currently the probes on the init functions in modules will be handled
by module callback, but the kernel init text isn't handled.
Without this, kprobes may access non-exist text area to disable or
remove it.

Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 14:27:24 -04:00
Tobias Klauser
54fa9ba564 ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 13:15:56 -04:00
Valentin Schneider
15e5d5b45b arch_topology, arm, arm64: define arch_scale_freq_invariant()
arch_scale_freq_invariant() is used by schedutil to determine whether
the scheduler's load-tracking signals are frequency invariant. Its
definition is overridable, though by default it is hardcoded to 'true'
if arch_scale_freq_capacity() is defined ('false' otherwise).

This behaviour is not overridden on arm, arm64 and other users of the
generic arch topology driver, which is somewhat precarious:
arch_scale_freq_capacity() will always be defined, yet not all cpufreq
drivers are guaranteed to drive the frequency invariance scale factor
setting. In other words, the load-tracking signals may very well *not*
be frequency invariant.

Now that cpufreq can be queried on whether the current driver is driving
the Frequency Invariance (FI) scale setting, the current situation can
be improved. This combines the query of whether cpufreq supports the
setting of the frequency scale factor, with whether all online CPUs are
counter-based FI enabled.

While cpufreq FI enablement applies at system level, for all CPUs,
counter-based FI support could also be used for only a subset of CPUs to
set the invariance scale factor. Therefore, if cpufreq-based FI support
is present, we consider the system to be invariant. If missing, we
require all online CPUs to be counter-based FI enabled in order for the
full system to be considered invariant.

If the system ends up not being invariant, a new condition is needed in
the counter initialization code that disables all scale factor setting
based on counters.

Precedence of counters over cpufreq use is not important here. The
invariant status is only given to the system if all CPUs have at least
one method of setting the frequency scale factor.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18 19:11:20 +02:00
Valentin Schneider
ecddc3a0d5 arch_topology, cpufreq: constify arch_* cpumasks
The passed cpumask arguments to arch_set_freq_scale() and
arch_freq_counters_available() are only iterated over, so reflect this
in the prototype. This also allows to pass system cpumasks like
cpu_online_mask without getting a warning.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18 19:11:04 +02:00
Ionela Voinescu
874f635310 cpufreq: report whether cpufreq supports Frequency Invariance (FI)
Now that the update of the FI scale factor is done in cpufreq core for
selected functions - target(), target_index() and fast_switch(),
we can provide feedback to the task scheduler and architecture code
on whether cpufreq supports FI.

For this purpose provide an external function to expose whether the
cpufreq drivers support FI, by using a static key.

The logic behind the enablement of cpufreq-based invariance is as
follows:
 - cpufreq-based invariance is disabled by default
 - cpufreq-based invariance is enabled if any of the callbacks
   above is implemented while the unsupported setpolicy() is not

The cpufreq_supports_freq_invariance() function only returns whether
cpufreq is instrumented with the arch_set_freq_scale() calls that
result in support for frequency invariance. Due to the lack of knowledge
on whether the implementation of arch_set_freq_scale() actually results
in the setting of a scale factor based on cpufreq information, it is up
to the architecture code to ensure the setting and provision of the
scale factor to the scheduler.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-18 19:10:56 +02:00
Jason Gunthorpe
a1255fff5d Merge branch 'mlx_sw_owner_v2' into rdma.git for-next
Leon Romanovsky says:

====================
This series from Alex extends software steering interface to support
devices with extra capability "sw_owner_2" which will replace existing
"sw_owner".
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_sw_owner_v2:
  RDMA/mlx5: Expose TIR and QP ICM address for sw_owner_v2 devices
  RDMA/mlx5: Allow DM allocation for sw_owner_v2 enabled devices
  RDMA/mlx5: Add sw_owner_v2 bit capability
2020-09-18 10:31:45 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Trond Myklebust
b9df46d08a pNFS/flexfiles: Be consistent about mirror index types
A mirror index is always of type u32.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18 09:25:33 -04:00
Mark Brown
264c03a245 stacktrace: Remove reliable argument from arch_stack_walk() callback
Currently the callback passed to arch_stack_walk() has an argument called
reliable passed to it to indicate if the stack entry is reliable, a comment
says that this is used by some printk() consumers. However in the current
kernel none of the arch_stack_walk() implementations ever set this flag to
true and the only callback implementation we have is in the generic
stacktrace code which ignores the flag. It therefore appears that this
flag is redundant so we can simplify and clarify things by removing it.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/20200914153409.25097-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-09-18 14:24:16 +01:00
Bard Liao
9026118f20 soundwire: Add generic bandwidth allocation algorithm
This algorithm computes bus parameters like clock frequency, frame
shape and port transport parameters based on active stream(s) running
on the bus.

Developers can also implement their own .compute_params() callback for
specific resource management algorithm, and set if before calling
sdw_add_bus_master()

Credits: this patch is based on an earlier internal contribution by
Vinod Koul, Sanyog Kale, Shreyas Nc and Hardik Shah. All hard-coded
values were removed from the initial contribution to use BIOS
information instead.

Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20200908131520.5712-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-09-18 17:48:51 +05:30
Tomi Valkeinen
92ffad62a6 New phy attributes slated for kernel release v5.10
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl9iCBUACgkQfBQHDyUj
 g0cN0A//WQRF5QACKkHB/xmqQ5UAW/BsZnMc38snDDs4LPEFuetA3H56t5F0eFJ/
 saFtxxKHHu+072Q4ZFFGFrVbSGC5YhZMhYgZVC8EUfDoBgZ0pf223BECqur+8meB
 /EUJhHwkn3enAuCVaX8zcnf8Yjnizss+fq3TZZiAxT+0Ve3pVNNhV6YDehKPi9/g
 HiDL4y0r51lJ/JnMiJcP1zbd0qzq8qyCRJWVB2vZCu6eD/4ZtkoRD3k5Ru99CXVk
 +/w/CwzEw6zZo98b7p7jG9lp/gCugYcJUIfJPrXYkg5aKfB1nY9KM9Yyt/pL/gTX
 Mlu2BmJw2YQnrQeJNJjSaawh+L9dankSSV7ddpt8D1twOp6Af6aWJw3KGNOoEFJo
 eFLbAra2RbIQRP6Vf/XqQKKcjTaAnrLhMxxU989Egp19J8lZ/HqjVcYJbjSEMR7x
 dXzjXMk2t9YBASe10q2Rlnvaems8455m6+QotTCiSQqQExWVMDTj0QYb7+J9Cth7
 3LQ300np9a1BzcKGAZgPbNS3OmZjVZxjFiV1XB0fWjxovPQBCWf2Zh9PgVtCSL5M
 LajqzJge0k+E2T9LbgjoT9ePcPIk3Mta1sE2PDaCysr9P0TqL1NbLueDjOMZ42/W
 rFcCVw9ZdTnbqC2MO45dc/UnwgGePUwSVYZ0pIM/q0axNmsmyR8=
 =lYZv
 -----END PGP SIGNATURE-----

Merge tag 'phy-attrs-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into 5.10/dp-pull

New phy attributes slated for kernel release v5.10. These are needed by
the Cadence MHDP driver.
2020-09-18 15:14:35 +03:00
Thomas Pedersen
37050e3ab0 ieee80211: redefine S1G bits with GENMASK
The S1G capability fields were defined by ORing BITS()
together, and expecting a custom macro to use the _SHIFT
definitions. Use the Linux kernel GENMASK for the
definitions now, and FIELD_{GET,PREP} to access the fields
in the future.

Take the chance to rename eg. S1G_CAPAB_B0 to the more
compact S1G_CAP0.

Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com>
Link: https://lore.kernel.org/r/20200908190323.15814-2-thomas@adapt-ip.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-09-18 12:27:27 +02:00
Georgi Djakov
628fdbcf9d Merge branch 'icc-syncstate' into icc-next
* icc-syncstate:
  interconnect: Add get_bw() callback
  interconnect: Add sync state support
  interconnect: qcom: Use icc_sync_state

Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-09-18 09:13:40 +03:00
Georgi Djakov
b1d681d8d3 interconnect: Add sync state support
The bootloaders often do some initial configuration of the interconnects
in the system and we want to keep this configuration until all consumers
have probed and expressed their bandwidth needs. This is because we don't
want to change the configuration by starting to disable unused paths until
every user had a chance to request the amount of bandwidth it needs.

To accomplish this we will implement an interconnect specific sync_state
callback which will synchronize (aggregate and set) the current bandwidth
settings when all consumers have been probed.

Link: https://lore.kernel.org/r/20200825170152.6434-3-georgi.djakov@linaro.org
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-09-18 08:56:52 +03:00
Georgi Djakov
cc80d10d6f interconnect: Add get_bw() callback
The interconnect controller hardware may support querying the current
bandwidth settings, so add a callback for providers to implement this
functionality if supported.

Link: https://lore.kernel.org/r/20200825170152.6434-2-georgi.djakov@linaro.org
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-09-18 08:53:37 +03:00
Alexei Starovoitov
09b28d76ea bpf: Add abnormal return checks.
LD_[ABS|IND] instructions may return from the function early. bpf_tail_call
pseudo instruction is either fallthrough or return. Allow them in the
subprograms only when subprograms are BTF annotated and have scalar return
types. Allow ld_abs and tail_call in the main program even if it calls into
subprograms. In the past that was not ok to do for ld_abs, since it was JITed
with special exit sequence. Since bpf_gen_ld_abs() was introduced the ld_abs
looks like normal exit insn from JIT point of view, so it's safe to allow them
in the main program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:56:07 -07:00
Maciej Fijalkowski
ebf7d1f508 bpf, x64: rework pro/epilogue and tailcall handling in JIT
This commit serves two things:
1) it optimizes BPF prologue/epilogue generation
2) it makes possible to have tailcalls within BPF subprogram

Both points are related to each other since without 1), 2) could not be
achieved.

In [1], Alexei says:
"The prologue will look like:
nop5
xor eax,eax  // two new bytes if bpf_tail_call() is used in this
             // function
push rbp
mov rbp, rsp
sub rsp, rounded_stack_depth
push rax // zero init tail_call counter
variable number of push rbx,r13,r14,r15

Then bpf_tail_call will pop variable number rbx,..
and final 'pop rax'
Then 'add rsp, size_of_current_stack_frame'
jmp to next function and skip over 'nop5; xor eax,eax; push rpb; mov
rbp, rsp'

This way new function will set its own stack size and will init tail
call
counter with whatever value the parent had.

If next function doesn't use bpf_tail_call it won't have 'xor eax,eax'.
Instead it would need to have 'nop2' in there."

Implement that suggestion.

Since the layout of stack is changed, tail call counter handling can not
rely anymore on popping it to rbx just like it have been handled for
constant prologue case and later overwrite of rbx with actual value of
rbx pushed to stack. Therefore, let's use one of the register (%rcx) that
is considered to be volatile/caller-saved and pop the value of tail call
counter in there in the epilogue.

Drop the BUILD_BUG_ON in emit_prologue and in
emit_bpf_tail_call_indirect where instruction layout is not constant
anymore.

Introduce new poke target, 'tailcall_bypass' to poke descriptor that is
dedicated for skipping the register pops and stack unwind that are
generated right before the actual jump to target program.
For case when the target program is not present, BPF program will skip
the pop instructions and nop5 dedicated for jmpq $target. An example of
such state when only R6 of callee saved registers is used by program:

ffffffffc0513aa1:       e9 0e 00 00 00          jmpq   0xffffffffc0513ab4
ffffffffc0513aa6:       5b                      pop    %rbx
ffffffffc0513aa7:       58                      pop    %rax
ffffffffc0513aa8:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc0513aaf:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0513ab4:       48 89 df                mov    %rbx,%rdi

When target program is inserted, the jump that was there to skip
pops/nop5 will become the nop5, so CPU will go over pops and do the
actual tailcall.

One might ask why there simply can not be pushes after the nop5?
In the following example snippet:

ffffffffc037030c:       48 89 fb                mov    %rdi,%rbx
(...)
ffffffffc0370332:       5b                      pop    %rbx
ffffffffc0370333:       58                      pop    %rax
ffffffffc0370334:       48 81 c4 00 00 00 00    add    $0x0,%rsp
ffffffffc037033b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffffffffc0370340:       48 81 ec 00 00 00 00    sub    $0x0,%rsp
ffffffffc0370347:       50                      push   %rax
ffffffffc0370348:       53                      push   %rbx
ffffffffc0370349:       48 89 df                mov    %rbx,%rdi
ffffffffc037034c:       e8 f7 21 00 00          callq  0xffffffffc0372548

There is the bpf2bpf call (at ffffffffc037034c) right after the tailcall
and jump target is not present. ctx is in %rbx register and BPF
subprogram that we will call into on ffffffffc037034c is relying on it,
e.g. it will pick ctx from there. Such code layout is therefore broken
as we would overwrite the content of %rbx with the value that was pushed
on the prologue. That is the reason for the 'bypass' approach.

Special care needs to be taken during the install/update/remove of
tailcall target. In case when target program is not present, the CPU
must not execute the pop instructions that precede the tailcall.

To address that, the following states can be defined:
A nop, unwind, nop
B nop, unwind, tail
C skip, unwind, nop
D skip, unwind, tail

A is forbidden (lead to incorrectness). The state transitions between
tailcall install/update/remove will work as follows:

First install tail call f: C->D->B(f)
 * poke the tailcall, after that get rid of the skip
Update tail call f to f': B(f)->B(f')
 * poke the tailcall (poke->tailcall_target) and do NOT touch the
   poke->tailcall_bypass
Remove tail call: B(f')->C(f')
 * poke->tailcall_bypass is poked back to jump, then we wait the RCU
   grace period so that other programs will finish its execution and
   after that we are safe to remove the poke->tailcall_target
Install new tail call (f''): C(f')->D(f'')->B(f'').
 * same as first step

This way CPU can never be exposed to "unwind, tail" state.

Last but not least, when tailcalls get mixed with bpf2bpf calls, it
would be possible to encounter the endless loop due to clearing the
tailcall counter if for example we would use the tailcall3-like from BPF
selftests program that would be subprogram-based, meaning the tailcall
would be present within the BPF subprogram.

This test, broken down to particular steps, would do:
entry -> set tailcall counter to 0, bump it by 1, tailcall to func0
func0 -> call subprog_tail
(we are NOT skipping the first 11 bytes of prologue and this subprogram
has a tailcall, therefore we clear the counter...)
subprog -> do the same thing as entry

and then loop forever.

To address this, the idea is to go through the call chain of bpf2bpf progs
and look for a tailcall presence throughout whole chain. If we saw a single
tail call then each node in this call chain needs to be marked as a subprog
that can reach the tailcall. We would later feed the JIT with this info
and:
- set eax to 0 only when tailcall is reachable and this is the entry prog
- if tailcall is reachable but there's no tailcall in insns of currently
  JITed prog then push rax anyway, so that it will be possible to
  propagate further down the call chain
- finally if tailcall is reachable, then we need to precede the 'call'
  insn with mov rax, [rbp - (stack_depth + 8)]

Tail call related cases from test_verifier kselftest are also working
fine. Sample BPF programs that utilize tail calls (sockex3, tracex5)
work properly as well.

[1]: https://lore.kernel.org/bpf/20200517043227.2gpq22ifoq37ogst@ast-mbp.dhcp.thefacebook.com/

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:55:30 -07:00
Maciej Fijalkowski
7f6e4312e1 bpf: Limit caller's stack depth 256 for subprogs with tailcalls
Protect against potential stack overflow that might happen when bpf2bpf
calls get combined with tailcalls. Limit the caller's stack depth for
such case down to 256 so that the worst case scenario would result in 8k
stack size (32 which is tailcall limit * 256 = 8k).

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:19:20 -07:00