Commit graph

74002 commits

Author SHA1 Message Date
Linus Torvalds
a5c6a1f0fe Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:

 - a small collection of remaining API conversion patches (all acked)
   which allow to finally remove the deprecated API

 - some documentation fixes and a MAINTAINERS addition

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: Add robert and myself as qcom i2c cci maintainers
  i2c: smbus: Fix spelling mistake in the comments
  Documentation/i2c: SMBus start signal is S not A
  i2c: remove deprecated i2c_new_device API
  Documentation: media: convert to use i2c_new_client_device()
  video: backlight: tosa_lcd: convert to use i2c_new_client_device()
  x86/platform/intel-mid: convert to use i2c_new_client_device()
  drm: encoder_slave: use new I2C API
  drm: encoder_slave: fix refcouting error for modules
2020-06-20 19:18:27 -07:00
Andy Shevchenko
b3337eb248 gpiolib: Introduce for_each_requested_gpio_in_range() macro
Introduce for_each_requested_gpio_in_range() macro which helps
to iterate over requested GPIO in a range. There are already
potential users of it, which are going to be converted
by the following patches.

For most of them for_each_requested_gpio() shortcut has been added.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200615150545.87964-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20 23:13:26 +02:00
Linus Torvalds
8b6ddd10d6 A few fixes and small cleanups for tracing:
- Have recordmcount work with > 64K sections (to support LTO)
  - kprobe RCU fixes
  - Correct a kprobe critical section with missing mutex
  - Remove redundant arch_disarm_kprobe() call
  - Fix lockup when kretprobe triggers within kprobe_flush_task()
  - Fix memory leak in fetch_op_data operations
  - Fix sleep in atomic in ftrace trace array sample code
  - Free up memory on failure in sample trace array code
  - Fix incorrect reporting of function_graph fields in format file
  - Fix quote within quote parsing in bootconfig
  - Fix return value of bootconfig tool
  - Add testcases for bootconfig tool
  - Fix maybe uninitialized warning in ftrace pid file code
  - Remove unused variable in tracing_iter_reset()
  - Fix some typos
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXu1jrRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoCMAP91nOccE3X+Nvc3zET3isDWnl1tWJxk
 icsBgN/JwBRuTAD/dnWTHIWM2/5lTiagvyVsmINdJHP6JLr8T7dpN9tlxAQ=
 =Cuo7
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Have recordmcount work with > 64K sections (to support LTO)

 - kprobe RCU fixes

 - Correct a kprobe critical section with missing mutex

 - Remove redundant arch_disarm_kprobe() call

 - Fix lockup when kretprobe triggers within kprobe_flush_task()

 - Fix memory leak in fetch_op_data operations

 - Fix sleep in atomic in ftrace trace array sample code

 - Free up memory on failure in sample trace array code

 - Fix incorrect reporting of function_graph fields in format file

 - Fix quote within quote parsing in bootconfig

 - Fix return value of bootconfig tool

 - Add testcases for bootconfig tool

 - Fix maybe uninitialized warning in ftrace pid file code

 - Remove unused variable in tracing_iter_reset()

 - Fix some typos

* tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix maybe-uninitialized compiler warning
  tools/bootconfig: Add testcase for show-command and quotes test
  tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
  tools/bootconfig: Fix to use correct quotes for value
  proc/bootconfig: Fix to use correct quotes for value
  tracing: Remove unused event variable in tracing_iter_reset
  tracing/probe: Fix memleak in fetch_op_data operations
  trace: Fix typo in allocate_ftrace_ops()'s comment
  tracing: Make ftrace packed events have align of 1
  sample-trace-array: Remove trace_array 'sample-instance'
  sample-trace-array: Fix sleeping function called from invalid context
  kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  kprobes: Remove redundant arch_disarm_kprobe() call
  kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
  kprobes: Use non RCU traversal APIs on kprobe_tables if possible
  kprobes: Suppress the suspicious RCU warning on kprobes
  recordmcount: support >64k sections
2020-06-20 13:17:47 -07:00
Lars-Peter Clausen
f11d59d87b iio: Move attach/detach of the poll func to the core
All devices using a triggered buffer need to attach and detach the trigger
to the device in order to properly work. Instead of doing this in each and
every driver by hand move this into the core.

At this point in time, all drivers should have been resolved to
attach/detach the poll-function in the same order.

This patch removes all explicit calls of iio_triggered_buffer_postenable()
& iio_triggered_buffer_predisable() in all drivers, since the core handles
now the pollfunc attach/detach.

The more peculiar change is for the 'at91-sama5d2_adc' driver, since it's
not immediately obvious that removing the hooks doesn't break anything.
Eugen was able to test on at91-sama5d2-adc driver, sama5d2-xplained board.
All seems to be fine.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Tested-by: Eugen Hristev <eugen.hristev@microchip.com> #for at91-sama5d2-adc
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-06-20 17:34:44 +01:00
Christophe Leroy
481e980a7c mm: Allow arches to provide ptep_get()
Since commit 9e343b467c ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
use READ_ONCE() to access complex page table entries like the one
defined for powerpc 8xx with 16k size pages.

Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.

Fixes: 9e343b467c ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu
2020-06-20 22:14:53 +10:00
Andrea Parri (Microsoft)
775f43facf Drivers: hv: vmbus: Remove the lock field from the vmbus_channel struct
The spinlock is (now) *not used to protect test-and-set accesses
to attributes of the structure or sc_list operations.

There is, AFAICT, a distinct lack of {WRITE,READ}_ONCE()s in the
handling of channel->state, but the changes below do not seem to
make things "worse".  ;-)

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20200617164642.37393-9-parri.andrea@gmail.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-06-20 09:16:19 +00:00
Russell King
389a338999 net: phy: read MMD ID from all present MMDs
Expand the device_ids[] array to allow all MMD IDs to be read rather
than just the first 8 MMDs, but only read the ID if the MDIO_STAT2
register reports that a device really is present here for these new
devices to maintain compatibility with our current behaviour.  Note
that only a limited number of devices have MDIO_STAT2.

88X3310 PHY vendor MMDs do are marked as present in the
devices_in_package, but do not contain IEE 802.3 compatible register
sets in their lower space.  This avoids reading incorrect values as MMD
identifiers.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19 20:17:15 -07:00
Russell King
320ed3bf90 net: phy: split devices_in_package
We have two competing requirements for the devices_in_package field.
We want to use it as a bit array indicating which MMDs are present, but
we also want to know if the Clause 22 registers are present.

Since "devices in package" is a term used in the 802.3 specification,
keep this as the as-specified values read from the PHY, and introduce
a new member "mmds_present" to indicate which MMDs are actually
present in the PHY, derived from the "devices in package" value.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19 20:17:15 -07:00
Russell King
c746053d27 net: phy: add support for probing MMDs >= 8 for devices-in-package
Add support for probing MMDs above 7 for a valid devices-in-package
specifier, but only probe the vendor MMDs for this if they also report
that there the device is present in status register 2.  This avoids
issues where the MMD is implemented, but does not provide IEEE 802.3
compliant registers (such as the MV88X3310 PHY.)

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19 20:17:15 -07:00
Eric Dumazet
cc7a21b6fb ipv6: icmp6: avoid indirect call for icmpv6_send()
If IPv6 is builtin, we do not need an expensive indirect call
to reach icmp6_send().

v2: put inline keyword before the type to avoid sparse warnings.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19 13:41:59 -07:00
Mauro Carvalho Chehab
8e2a46a40f docs: move remaining stuff under Documentation/*.txt to Documentation/staging
There are several files that I was unable to find a proper place
for them, and 3 ones that are still in plain old text format.

Let's place those stuff behind the carpet, as we'd like to keep the
root directory clean.

We can later discuss and move those into better places.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/11bd0d75e65a874f7c276a0aeab0fe13f3376f5f.1592203650.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-06-19 14:17:05 -06:00
Linus Torvalds
d2b1c81f5f block-5.8-2020-06-19
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7s0SAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpp+YEACVqFvsfzxKCqa61IzyuOaPfnj9awyP+MY2
 7V6y9sDDHL8sp6aPDbHvqFnqz0O7E+7nHVZD2rf2qc6tKKMvJYNO/BFZSXPvWTZV
 KQ4cBChf/LDwqAKOnI4ZhmF5UcSyyob1yMy4uJ+U0gQiXXrRMbwJ3N1K24a9dr4c
 epkzGavR0Q+PJ9BbUgjACjbRdT+vrP4bOu0cuyCGkIpD9eCerKJ6mFaUAj0FDthD
 bg4BJj+c8Ij6LO0V++Wga6OxccmL43KeP0ky8B3x07PfAl+tDWqsbHSlU2YPtdcq
 5nKgMMTW16mVnZeO2/W0JB7tn89VubsmyvIFcm2KNeeRqSnEZyW9HI8n4kq994Ju
 xMH24lgbsU4trNeYkgOmzPoJJZ+LShkn+rnldyI1U/fhpEYub7DqfVySuT7ti9in
 uFpQdeRUmPsdw92F3+o6h8OYAflpcQQ7CblkzxPEeV4OyzOZasb+S9tMNPe59KBh
 0MtHv9IfzgtDihR6HuXifitXaP+GtH4x3D2z0dzEdooHKHC/+P3WycS5daG+3WKQ
 xV5lJruvpTuxhXKLFAH0wRrxnVlB0VUvhQ21T3WgHrwF0btbdmQMHFc83XOxBIB4
 jHWJMHGc4xp1ZdpWFBC8Cj79OmJh1w/ao8+/cf8SUoTB0LzFce1B8LvwnxgpcpUk
 VjIOrl7zhQ==
 =LeLd
 -----END PGP SIGNATURE-----

Merge tag 'block-5.8-2020-06-19' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Use import_uuid() where appropriate (Andy)

 - bcache fixes (Coly, Mauricio, Zhiqiang)

 - blktrace sparse warnings fix (Jan)

 - blktrace concurrent setup fix (Luis)

 - blkdev_get use-after-free fix (Jason)

 - Ensure all blk-mq maps are updated (Weiping)

 - Loop invalidate bdev fix (Zheng)

* tag 'block-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
  block: make function 'kill_bdev' static
  loop: replace kill_bdev with invalidate_bdev
  partitions/ldm: Replace uuid_copy() with import_uuid() where it makes sense
  block: update hctx map when use multiple maps
  blktrace: Avoid sparse warnings when assigning q->blk_trace
  blktrace: break out of blktrace setup on concurrent calls
  block: Fix use-after-free in blkdev_get()
  trace/events/block.h: drop kernel-doc for dropped function parameter
  blk-mq: Remove redundant 'return' statement
  bcache: pr_info() format clean up in bcache_device_init()
  bcache: use delayed kworker fo asynchronous devices registration
  bcache: check and adjust logical block size for backing devices
  bcache: fix potential deadlock problem in btree_gc_coalesce
2020-06-19 13:11:26 -07:00
Linus Torvalds
592be758f1 libata-5.8-2020-06-19
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7s0McQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvbNEAC1oSIlpp0v1ahvt8EDVPGL+0P4YfscrSf0
 5KGvt0vCRfzgluSbDZgVG1sW5uG87ymMSFegwr/OL/AVsyf54z5DKRGEQjh41Kw0
 uFVq/5XWUg4B3OV8m8v3idS4LmSLTqqhGxc1Yf+NUeI6Gf+tyWUvyw2+fpihwtzd
 XdPcz+oxpUbJtOt0O3umua8HW8ncnWm5wd+DwvDBPec4hSz/G1nfrwpHSxj0JJcr
 zfbjZLPZZy9XZnlxxpOA9XgANYNklM/Kf3xXqk+3cVWKhE2oiIpYlR+3hqs398YX
 BgEIzHsacf/8b6WMRmRIsyCoKfiGA5ec5kTKp8typmP3GFBKbYTCVLgHT3K/iIV6
 BYaar0PBPeQhRYWlGQVZvoEwuF+c64BXzNLQB96YTCRJvhsUhmjdYQCYQNLcb99C
 zbpwJ3wukWLs1LJx8lgEbRIG3hZGDpO5k0prYuqkfX4Py6lzdGKJP29QStjCoydx
 NMr+V3tsf3PD+xASGavak/cQESctGVzEl9VBsQOZj8seB1RjzP7wzBRMuHXkcYxo
 v7MMM/6o4y8uPei2Ch69FSNZ2w9XsuE7YfTqTeE5VMHYh9pnYtlMNbESWpLoasXK
 Akxvt3+JNShNYv1ecbhn75HASioTP0i9OmClOrCSoP/Zf7ot5rxsTIE4cyRNLKDh
 JES0sjUZYQ==
 =iH5/
 -----END PGP SIGNATURE-----

Merge tag 'libata-5.8-2020-06-19' of git://git.kernel.dk/linux-block

Pull libata fixes from Jens Axboe:
 "A few minor changes that should go into this release"

* tag 'libata-5.8-2020-06-19' of git://git.kernel.dk/linux-block:
  libata: Use per port sync for detach
  ata/libata: Fix usage of page address by page_address in ata_scsi_mode_select_xlat function
  sata_rcar: handle pm_runtime_get_sync failure cases
2020-06-19 13:09:40 -07:00
Linus Torvalds
672f9255a7 An important follow-up for replica reads support that went into -rc1
and two target_copy() fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl7ssCITHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi1cVB/9V1BPWKdPKiRWaOgYJSd0qk3izSZQp
 4U+foILpQi0fo23H5PAINrYEcURFjNxfFF7P9esQQ4i3NhbA/b7/tLu6sSsxq5HQ
 FoUgYPj5WXKiJ+pF2JayP6lpxGDdUHjWYFBR28P9g9otOepRBkJl91ZuU7Hp9rSp
 usMSl1+3zJ+HModREk4VmNUgEDW/8DW2EDXXETzoLdhgUXhG+6KYC2qrExrBR7L3
 k1V9+zPgZK+qykim0p453eCQbYlO74SKv1/Q7FPqYGGkohyyHaYMqw7/xItbqMPa
 5lACBuobQ4YiDfhiNcF15Mr1gZjexYbAcAx799bnbv3z/yTZi9IFszhK
 =MFh3
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.8-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "An important follow-up for replica reads support that went into -rc1
  and two target_copy() fixups"

* tag 'ceph-for-5.8-rc2' of git://github.com/ceph/ceph-client:
  libceph: don't omit used_replica in target_copy()
  libceph: don't omit recovery_deletes in target_copy()
  libceph: move away from global osd_req_flags
2020-06-19 12:25:04 -07:00
Linus Torvalds
98b769942c Add flex-array size helper
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl7sJLcWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJux3D/9RCaL24L5lNMZrfWo6lHhXUain
 EdUUdnzKB2uschPXcFPm7olaAGrwh/nlbJc51uKWrdEaLbM2xausC6T+tg5K/fJc
 1o/Rhe74GhuMlZqcJwKG90pM/11lHQsoYavNl3SZLiDwgIIXxOzdP/ZMN3K+Ug75
 W/5eVhwdU5A87tCXU50UwcAWvLw2kBQzS7HDa4CWuiV4cACNBaTtiUomJ/yCFE5V
 gZW139HM/XGFKMNxLvICT00ZznePDcvYbXlXdRNGKKC8O8t0Si6Xb7TLMUhcBNIQ
 Npvz2g+hofhiFlVDMOpOeAxRrkb7gdYaOqTHk9/YcxarsHkJA5EN7HQnjMnNScNf
 5HIFnP5IOrcnjA009qM+tvFCMJo3FglBSr3pLCcnUoFwnut7sY8+z67JaeVRp2Hc
 3SOcXWamh4HJW+YsDR7bAgBkUPq0FmMY6oGVqIqwTWQJUwGHS4RljFl43HqFn7Fm
 Bggk1OBjm4Kifj0EwujBdLDNuaaNMb2bmSFr4L6CJcQBU7gOyT87TuSOHR5oI7U7
 ywQDr9sQIB70wmQ27z9OqZwEnTIV0LzlD0ZfrhkNS4bcK42livS0qv3hx8YnPaTg
 a0leNmHB6qwkIxoFdCMY2E6ZmpCPUAwPk8NqTPrPmwfMoDbSlMW/6ENjYSyiFB8r
 nbpx2KU0JjsRlYfkhA==
 =94sc
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull flex-array size helper from Kees Cook:
 "During the treewide clean-ups of zero-length "flexible arrays", the
  struct_size() helper was heavily used, but it was noticed that many
  times it would have been nice to have an additional helper to get the
  size of just the flexible array itself.

  This need appears to be even more common when cleaning up the 1-byte
  array "flexible arrays", so Gustavo implemented it.

  I'd love to get this landed early so it can be used during the v5.9
  dev cycle to ease the 1-byte array cleanups."

* tag 'overflow-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  overflow.h: Add flex_array_size() helper
2020-06-19 11:45:03 -07:00
Prashant Malani
e32b16c313 platform/chrome: cros_ec: Update mux state bits
Sync the EC_CMD_USB_PD_MUX_INFO mux state bit fields with the Chrome EC
code base. The newly added bit fields will be used for cros-ec-typec mux
control.

Signed-off-by: Prashant Malani <pmalani@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
2020-06-19 19:16:18 +02:00
Andrea Parri (Microsoft)
458d090fba Drivers: hv: vmbus: Remove the numa_node field from the vmbus_channel struct
The field is read only in numa_node_show() and it is already stored twice
(after a call to cpu_to_node()) in target_cpu_store() and init_vp_index();
there is no need to "cache" its value in the channel data structure.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20200617164642.37393-3-parri.andrea@gmail.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-06-19 15:38:15 +00:00
Andrea Parri (Microsoft)
5bf7468211 Drivers: hv: vmbus: Remove the target_vp field from the vmbus_channel struct
The field is read only in __vmbus_open() and it is already stored twice
(after a call to hv_cpu_number_to_vp_number()) in target_cpu_store() and
init_vp_index(); there is no need to "cache" its value in the channel
data structure.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20200617164642.37393-2-parri.andrea@gmail.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-06-19 15:38:10 +00:00
Wolfram Sang
390fd0475a i2c: remove deprecated i2c_new_device API
All in-tree users have been converted to the new i2c_new_client_device
function, so remove this deprecated one.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-06-19 09:20:28 +02:00
Greg Kroah-Hartman
91c7eaa686 USB: rename USB quirk to USB_QUIRK_ENDPOINT_IGNORE
The USB core has a quirk flag to ignore specific endpoints, so rename it
to be more obvious what this quirk does.

Cc: Johan Hovold <johan@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Richard Dodd <richard.o.dodd@gmail.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Jonathan Cox <jonathan@jdcox.net>
Cc: Bastien Nocera <hadess@hadess.net>
Cc: "Thiébaud Weksteen" <tweek@google.com>
Cc: Nishad Kamdar <nishadkamdar@gmail.com>
Link: https://lore.kernel.org/r/20200618094300.1887727-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-19 08:58:37 +02:00
Taehee Yoo
fb7861d14c net: core: reduce recursion limit value
In the current code, ->ndo_start_xmit() can be executed recursively only
10 times because of stack memory.
But, in the case of the vxlan, 10 recursion limit value results in
a stack overflow.
In the current code, the nested interface is limited by 8 depth.
There is no critical reason that the recursion limitation value should
be 10.
So, it would be good to be the same value with the limitation value of
nesting interface depth.

Test commands:
    ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789
    ip link set vxlan10 up
    ip a a 192.168.10.1/24 dev vxlan10
    ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent

    for i in {9..0}
    do
        let A=$i+1
	ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789
	ip link set vxlan$i up
	ip a a 192.168.$i.1/24 dev vxlan$i
	ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent
	bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self
    done
    hping3 192.168.10.2 -2 -d 60000

Splat looks like:
[  103.814237][ T1127] =============================================================================
[  103.871955][ T1127] BUG kmalloc-2k (Tainted: G    B            ): Padding overwritten. 0x00000000897a2e4f-0x000
[  103.873187][ T1127] -----------------------------------------------------------------------------
[  103.873187][ T1127]
[  103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020
[  103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G    B             5.7.0+ #575
[  103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  103.883006][ T1127] Call Trace:
[  103.883324][ T1127]  dump_stack+0x96/0xdb
[  103.883716][ T1127]  slab_err+0xad/0xd0
[  103.884106][ T1127]  ? _raw_spin_unlock+0x1f/0x30
[  103.884620][ T1127]  ? get_partial_node.isra.78+0x140/0x360
[  103.885214][ T1127]  slab_pad_check.part.53+0xf7/0x160
[  103.885769][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.886316][ T1127]  check_slab+0x97/0xb0
[  103.886763][ T1127]  alloc_debug_processing+0x84/0x1a0
[  103.887308][ T1127]  ___slab_alloc+0x5a5/0x630
[  103.887765][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.888265][ T1127]  ? lock_downgrade+0x730/0x730
[  103.888762][ T1127]  ? pskb_expand_head+0x110/0xe10
[  103.889244][ T1127]  ? __slab_alloc+0x3e/0x80
[  103.889675][ T1127]  __slab_alloc+0x3e/0x80
[  103.890108][ T1127]  __kmalloc_node_track_caller+0xc7/0x420
[ ... ]

Fixes: 11a766ce91 ("net: Increase xmit RECURSION_LIMIT to 10.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-18 20:12:33 -07:00
Linus Torvalds
5e857ce6ea Merge branch 'hch' (maccess patches from Christoph Hellwig)
Merge non-faulting memory access cleanups from Christoph Hellwig:
 "Andrew and I decided to drop the patches implementing your suggested
  rename of the probe_kernel_* and probe_user_* helpers from -mm as
  there were way to many conflicts.

  After -rc1 might be a good time for this as all the conflicts are
  resolved now"

This also adds a type safety checking patch on top of the renaming
series to make the subtle behavioral difference between 'get_user()' and
'get_kernel_nofault()' less potentially dangerous and surprising.

* emailed patches from Christoph Hellwig <hch@lst.de>:
  maccess: make get_kernel_nofault() check for minimal type compatibility
  maccess: rename probe_kernel_address to get_kernel_nofault
  maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
  maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
2020-06-18 12:35:51 -07:00
Linus Torvalds
0c389d89ab maccess: make get_kernel_nofault() check for minimal type compatibility
Now that we've renamed probe_kernel_address() to get_kernel_nofault()
and made it look and behave more in line with get_user(), some of the
subtle type behavior differences end up being more obvious and possibly
dangerous.

When you do

        get_user(val, user_ptr);

the type of the access comes from the "user_ptr" part, and the above
basically acts as

        val = *user_ptr;

by design (except, of course, for the fact that the actual dereference
is done with a user access).

Note how in the above case, the type of the end result comes from the
pointer argument, and then the value is cast to the type of 'val' as
part of the assignment.

So the type of the pointer is ultimately the more important type both
for the access itself.

But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
differently.  When you do

        get_kernel_nofault(val, kernel_ptr);

it behaves like

        val = *(typeof(val) *)kernel_ptr;

except, of course, for the fact that the actual dereference is done with
exception handling so that a faulting access is suppressed and returned
as the error code.

But note how different the casting behavior of the two superficially
similar accesses are: one does the actual access in the size of the type
the pointer points to, while the other does the access in the size of
the target, and ignores the pointer type entirely.

Actually changing get_kernel_nofault() to act like get_user() is almost
certainly the right thing to do eventually, but in the meantime this
patch adds logit to at least verify that the pointer type is compatible
with the type of the result.

In many cases, this involves just casting the pointer to 'void *' to
make it obvious that the type of the pointer is not the important part.
It's not how 'get_user()' acts, but at least the behavioral difference
is now obvious and explicit.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18 12:10:37 -07:00
Christoph Hellwig
25f12ae45f maccess: rename probe_kernel_address to get_kernel_nofault
Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Also switch the argument order around, so that it acts and looks
like get_user().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18 11:14:40 -07:00
Luc Van Oostenryck
670d0a4b10 sparse: use identifiers to define address spaces
Currently, address spaces in warnings are displayed as '<asn:X>' with
'X' being the address space's arbitrary number.

But since sparse v0.6.0-rc1 (late December 2018), sparse allows you to
define the address spaces using an identifier instead of a number.  This
identifier is then directly used in the warnings.

So, use the identifiers '__user', '__iomem', '__percpu' & '__rcu' for
the corresponding address spaces.  The default address space, __kernel,
being not displayed in warnings, stays defined as '0'.

With this change, warnings that used to be displayed as:

	cast removes address space '<asn:1>' of expression
	... void [noderef] <asn:2> *

will now be displayed as:

	cast removes address space '__user' of expression
	... void [noderef] __iomem *

This also moves the __kernel annotation to be the first one, since it is
quite different from the others because it's the default one, and so:

 - it's never displayed

 - it's normally not needed, nor in type annotations, nor in cast
   between address spaces. The only time it's needed is when it's
   combined with a typeof to express "the same type as this one but
   without the address space"

 - it can't be defined with a name, '0' must be used.

So, it seemed strange to me to have it in the middle of the other
ones.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18 10:05:23 -07:00
Zheng Bin
3373a3461a block: make function 'kill_bdev' static
kill_bdev does not have any external user, so make it static.

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-18 09:24:35 -06:00
Kai-Heng Feng
b5292111de libata: Use per port sync for detach
Commit 130f4caf14 ("libata: Ensure ata_port probe has completed before
detach") may cause system freeze during suspend.

Using async_synchronize_full() in PM callbacks is wrong, since async
callbacks that are already scheduled may wait for not-yet-scheduled
callbacks, causes a circular dependency.

Instead of using big hammer like async_synchronize_full(), use async
cookie to make sure port probe are synced, without affecting other
scheduled PM callbacks.

Fixes: 130f4caf14 ("libata: Ensure ata_port probe has completed before detach")
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: John Garry <john.garry@huawei.com>
BugLink: https://bugs.launchpad.net/bugs/1867983
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-18 09:21:40 -06:00
Leon Romanovsky
ab183d460d RDMA/mlx5: Add missed RST2INIT and INIT2INIT steps during ECE handshake
Missed steps during ECE handshake left userspace application with less
options for the ECE handshake. Pass ECE options in the additional
transitions.

Fixes: 50aec2c313 ("RDMA/mlx5: Return ECE data after modify QP")
Link: https://lore.kernel.org/r/20200616104536.2426384-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-18 09:52:29 -03:00
Neil Armstrong
6c27219e34 mfd: Add support for the Khadas System control Microcontroller
This Microcontroller is present on the Khadas VIM1, VIM2, VIM3 and Edge
boards.

It has multiple boot control features like password check, power-on
options, power-off control and system FAN control on recent boards.

This implements a very basic MFD driver with the fan control and User
NVMEM cells.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-06-18 11:23:16 +01:00
Charles Keepax
7f8a137f73 mfd: madera: Remove unused forward declaration of madera_codec_pdata
This forward declaration is redundant since the header including the
full data structure is included.

Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-06-18 11:23:15 +01:00
Benjamin Gaignard
e0bcc58d87 mfd: stm32: Add defines to be used for clkevent purpose
Add defines to be able to enable/clear irq and configure one shot mode.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2020-06-18 11:19:42 +01:00
Kurt Kanzenbach
f097eb38f7 timekeeping: Fix kerneldoc system_device_crosststamp & al
Make kernel doc comments actually work and fix the syncronized typo.

[ tglx: Added the missing /** and fixed up formatting ]

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200609081726.5657-1-kurt@linutronix.de
2020-06-18 11:37:03 +02:00
Saravana Kannan
7d34ca3854 driver core: Add device_is_dependent() to linux/device.h
DT implementation of fw_devlink needs this function to detect cycles. So
make it available.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
2020-06-17 16:10:56 -06:00
Guru Das Srinagesh
a9d887dc1c pwm: Convert period and duty cycle to u64
Because period and duty cycle are defined as ints with units of
nanoseconds, the maximum time duration that can be set is limited to
~2.147 seconds. Change their definitions to u64 in the structs of the
PWM framework so that higher durations may be set.

Also use the right format specifiers in debug prints in both core.c,
pwm-stm32-lp.c as well as video/fbdev/ssd1307fb.c.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Guru Das Srinagesh <gurus@codeaurora.org>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2020-06-17 20:42:11 +02:00
Christoph Hellwig
c0ee37e85e maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Christoph Hellwig
fe557319aa maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Herbert Xu
c935cd62d3 lockdep: Split header file into lockdep and lockdep_types
There is a header file inclusion loop between asm-generic/bug.h
and linux/kernel.h.  This causes potential compile failurs depending
on the which file is included first.  One way of breaking this loop
is to stop spinlock_types.h from including lockdep.h.  This patch
splits lockdep.h into two files for this purpose.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/E1jlSJz-0003hE-8g@fornost.hmeau.com
2020-06-17 18:33:13 +02:00
Guenter Roeck
2accfa6905 cpu/speculation: Add prototype for cpu_show_srbds()
0-day is not happy that there is no prototype for cpu_show_srbds():

  drivers/base/cpu.c:565:16: error: no previous prototype for 'cpu_show_srbds'

Fixes: 7e5b3c267d ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200617141410.93338-1-linux@roeck-us.net
2020-06-17 17:28:08 +02:00
Dave Jiang
47ec7f09bc dmaengine: cookie bypass for out of order completion
The cookie tracking in dmaengine expects all submissions completed in
order. Some DMA devices like Intel DSA can complete submissions out of
order, especially if configured with a work queue sharing multiple DMA
engines. Add a status DMA_OUT_OF_ORDER that tx_status can be returned for
those DMA devices. The user should use callbacks to track the completion
rather than the DMA cookie. This would address the issue of dmatest
complaining that descriptors are "busy" when the cookie count goes
backwards due to out of order completion. Add DMA_COMPLETION_NO_ORDER
DMA capability to allow the driver to flag the device's ability to complete
operations out of order.

Reported-by: Swathi Kovvuri <swathi.kovvuri@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Swathi Kovvuri <swathi.kovvuri@intel.com>
Link: https://lore.kernel.org/r/158939557151.20335.12404113976045569870.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-06-17 19:45:14 +05:30
Ard Biesheuvel
2a55280a36 efi/libstub: arm: Print CPU boot mode and MMU state at boot
On 32-bit ARM, we may boot at HYP mode, or with the MMU and caches off
(or both), even though the EFI spec does not actually support this.
While booting at HYP mode is something we might tolerate, fiddling
with the caches is a more serious issue, as disabling the caches is
tricky to do safely from C code, and running without the Dcache makes
it impossible to support unaligned memory accesses, which is another
explicit requirement imposed by the EFI spec.

So take note of the CPU mode and MMU state in the EFI stub diagnostic
output so that we can easily diagnose any issues that may arise from
this. E.g.,

  EFI stub: Entering in SVC mode with MMU enabled

Also, capture the CPSR and SCTLR system register values at EFI stub
entry, and after ExitBootServices() returns, and check whether the
MMU and Dcache were disabled at any point. If this is the case, a
diagnostic message like the following will be emitted:

  efi: [Firmware Bug]: EFI stub was entered with MMU and Dcache disabled, please fix your firmware!
  efi: CPSR at EFI stub entry        : 0x600001d3
  efi: SCTLR at EFI stub entry       : 0x00c51838
  efi: CPSR after ExitBootServices() : 0x600001d3
  efi: SCTLR after ExitBootServices(): 0x00c50838

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
2020-06-17 15:29:11 +02:00
Christoph Hellwig
26749b3201 dma-direct: mark __dma_direct_alloc_pages static
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-17 09:29:37 +02:00
Gustavo A. R. Silva
b19d57d0f3 overflow.h: Add flex_array_size() helper
Add flex_array_size() helper for the calculation of the size, in bytes,
of a flexible array member contained within an enclosing structure.

Example of usage:

struct something {
	size_t count;
	struct foo items[];
};

struct something *instance;

instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
instance->count = count;
memcpy(instance->items, src, flex_array_size(instance, items, instance->count));

The helper returns SIZE_MAX on overflow instead of wrapping around.

Additionally replaces parameter "n" with "count" in struct_size() helper
for greater clarity and unification.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedor
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-16 20:45:08 -07:00
Jiri Olsa
9b38cc704e kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2

Fixes: ef53d9c5e4 ("kprobes: improve kretprobe scalability with hashed locking")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16 21:21:01 -04:00
Linus Torvalds
ffbc93768e flexible-array member conversion patches for 5.8-rc2
Hi Linus,
 
 Please, pull the following patches that replace zero-length arrays with
 flexible-array members.
 
 Notice that all of these patches have been baking in linux-next for
 two development cycles now.
 
 There is a regular need in the kernel to provide a way to declare having a
 dynamically sized set of trailing elements in a structure. Kernel code should
 always use “flexible array members”[1] for these cases. The older style of
 one-element or zero-length arrays should no longer be used[2].
 
 C99 introduced “flexible array members”, which lacks a numeric size for the
 array declaration entirely:
 
 struct something {
         size_t count;
         struct foo items[];
 };
 
 This is the way the kernel expects dynamically sized trailing elements to be
 declared. It allows the compiler to generate errors when the flexible array
 does not occur last in the structure, which helps to prevent some kind of
 undefined behavior[3] bugs from being inadvertently introduced to the codebase.
 It also allows the compiler to correctly analyze array sizes (via sizeof(),
 CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For instance, there is no
 mechanism that warns us that the following application of the sizeof() operator
 to a zero-length array always results in zero:
 
 struct something {
         size_t count;
         struct foo items[0];
 };
 
 struct something *instance;
 
 instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
 instance->count = count;
 
 size = sizeof(instance->items) * instance->count;
 memcpy(instance->items, source, size);
 
 At the last line of code above, size turns out to be zero, when one might have
 thought it represents the total size in bytes of the dynamic memory recently
 allocated for the trailing array items. Here are a couple examples of this
 issue[4][5]. Instead, flexible array members have incomplete type, and so the
 sizeof() operator may not be applied[6], so any misuse of such operators will
 be immediately noticed at build time.
 
 The cleanest and least error-prone way to implement this is through the use of
 a flexible array member:
 
 struct something {
         size_t count;
         struct foo items[];
 };
 
 struct something *instance;
 
 instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
 instance->count = count;
 
 size = sizeof(instance->items[0]) * instance->count;
 memcpy(instance->items, source, size);
 
 Thanks
 --
 Gustavo
 
 [1] https://en.wikipedia.org/wiki/Flexible_array_member
 [2] https://github.com/KSPP/linux/issues/21
 [3] https://git.kernel.org/linus/76497732932f15e7323dc805e8ea8dc11bb587cf
 [4] https://git.kernel.org/linus/f2cd32a443da694ac4e28fbf4ac6f9d5cc63a539
 [5] https://git.kernel.org/linus/ab91c2a89f86be2898cee208d492816ec238b2cf
 [6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl7oSmYACgkQRwW0y0cG
 2zGEiw/9FiH3MBwMlPVJPcneY1wCH/N6ZSf+kr7SJiVwV/YbBe9EWuaKZ0D4vAWm
 kTACkOfsZ1me1OKz9wNrOxn0zezTMFQK2PLPgzKIPuK0Hg8MW1EU63RIRsnr0bPc
 b90wZwyBQtLbGRC3/9yAACKwFZe/SeYoV5rr8uylffA35HZW3SZbTex6XnGCF9Q5
 UYwnz7vNg+9VH1GRQeB5jlqL7mAoRzJ49I/TL3zJr04Mn+xC+vVBS7XwipDd03p+
 foC6/KmGhlCO9HMPASReGrOYNPydDAMKLNPdIfUlcTKHWsoTjGOcW/dzfT4rUu6n
 nKr5rIqJ4FdlIvXZL5P5w7Uhkwbd3mus5G0HBk+V/cUScckCpBou+yuGzjxXSitQ
 o0qPsGjWr3v+gxRWHj8YO/9MhKKKW0Iy+QmAC9+uLnbfJdbUwYbLIXbsOKnokCA8
 jkDEr64F5hFTKtajIK4VToJK1CsM3D9dwTub27lwZysHn3RYSQdcyN+9OiZgdzpc
 GlI6QoaqKR9AT4b/eBmqlQAKgA07zSQ5RsIjRm6hN3d7u/77x2kyrreo+trJyVY2
 F17uEOzfTqZyxtkPayE8DVjTtbByoCuBR0Vm1oMAFxjyqZQY5daalB0DKd1mdYqi
 khIXqNAuYqHOb898fEuzidjV38hxZ9y8SAym3P7WnYl+Hxz+8Jo=
 =8HUQ
 -----END PGP SIGNATURE-----

Merge tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array member conversions from Gustavo A. R. Silva:
 "Replace zero-length arrays with flexible-array members.

  Notice that all of these patches have been baking in linux-next for
  two development cycles now.

  There is a regular need in the kernel to provide a way to declare
  having a dynamically sized set of trailing elements in a structure.
  Kernel code should always use “flexible array members”[1] for these
  cases. The older style of one-element or zero-length arrays should no
  longer be used[2].

  C99 introduced “flexible array members”, which lacks a numeric size
  for the array declaration entirely:

        struct something {
                size_t count;
                struct foo items[];
        };

  This is the way the kernel expects dynamically sized trailing elements
  to be declared. It allows the compiler to generate errors when the
  flexible array does not occur last in the structure, which helps to
  prevent some kind of undefined behavior[3] bugs from being
  inadvertently introduced to the codebase.

  It also allows the compiler to correctly analyze array sizes (via
  sizeof(), CONFIG_FORTIFY_SOURCE, and CONFIG_UBSAN_BOUNDS). For
  instance, there is no mechanism that warns us that the following
  application of the sizeof() operator to a zero-length array always
  results in zero:

        struct something {
                size_t count;
                struct foo items[0];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items) * instance->count;
        memcpy(instance->items, source, size);

  At the last line of code above, size turns out to be zero, when one
  might have thought it represents the total size in bytes of the
  dynamic memory recently allocated for the trailing array items. Here
  are a couple examples of this issue[4][5].

  Instead, flexible array members have incomplete type, and so the
  sizeof() operator may not be applied[6], so any misuse of such
  operators will be immediately noticed at build time.

  The cleanest and least error-prone way to implement this is through
  the use of a flexible array member:

        struct something {
                size_t count;
                struct foo items[];
        };

        struct something *instance;

        instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
        instance->count = count;

        size = sizeof(instance->items[0]) * instance->count;
        memcpy(instance->items, source, size);

  instead"

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
[4] commit f2cd32a443 ("rndis_wlan: Remove logically dead code")
[5] commit ab91c2a89f ("tpm: eventlog: Replace zero-length array with flexible-array member")
[6] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

* tag 'flex-array-conversions-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (41 commits)
  w1: Replace zero-length array with flexible-array
  tracing/probe: Replace zero-length array with flexible-array
  soc: ti: Replace zero-length array with flexible-array
  tifm: Replace zero-length array with flexible-array
  dmaengine: tegra-apb: Replace zero-length array with flexible-array
  stm class: Replace zero-length array with flexible-array
  Squashfs: Replace zero-length array with flexible-array
  ASoC: SOF: Replace zero-length array with flexible-array
  ima: Replace zero-length array with flexible-array
  sctp: Replace zero-length array with flexible-array
  phy: samsung: Replace zero-length array with flexible-array
  RxRPC: Replace zero-length array with flexible-array
  rapidio: Replace zero-length array with flexible-array
  media: pwc: Replace zero-length array with flexible-array
  firmware: pcdp: Replace zero-length array with flexible-array
  oprofile: Replace zero-length array with flexible-array
  block: Replace zero-length array with flexible-array
  tools/testing/nvdimm: Replace zero-length array with flexible-array
  libata: Replace zero-length array with flexible-array
  kprobes: Replace zero-length array with flexible-array
  ...
2020-06-16 17:23:57 -07:00
Christian Brauner
60997c3d45
close_range: add CLOSE_RANGE_UNSHARE
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:

unshare(CLONE_FILES);
close_range(3, ~0U);

as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.

This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.

The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-17 00:07:38 +02:00
Christian Brauner
278a5fbaed
open: add close_range()
This adds the close_range() syscall. It allows to efficiently close a range
of file descriptors up to all file descriptors of a calling task.

I was contacted by FreeBSD as they wanted to have the same close_range()
syscall as we proposed here. We've coordinated this and in the meantime, Kyle
was fast enough to merge close_range() into FreeBSD already in April:
https://reviews.freebsd.org/D21627
https://svnweb.freebsd.org/base?view=revision&revision=359836
and the current plan is to backport close_range() to FreeBSD 12.2 (cf. [2])
once its merged in Linux too. Python is in the process of switching to
close_range() on FreeBSD and they are waiting on us to merge this to switch on
Linux as well: https://bugs.python.org/issue38061

The syscall came up in a recent discussion around the new mount API and
making new file descriptor types cloexec by default. During this
discussion, Al suggested the close_range() syscall (cf. [1]). Note, a
syscall in this manner has been requested by various people over time.

First, it helps to close all file descriptors of an exec()ing task. This
can be done safely via (quoting Al's example from [1] verbatim):

        /* that exec is sensitive */
        unshare(CLONE_FILES);
        /* we don't want anything past stderr here */
        close_range(3, ~0U);
        execve(....);

The code snippet above is one way of working around the problem that file
descriptors are not cloexec by default. This is aggravated by the fact that
we can't just switch them over without massively regressing userspace. For
a whole class of programs having an in-kernel method of closing all file
descriptors is very helpful (e.g. demons, service managers, programming
language standard libraries, container managers etc.).
(Please note, unshare(CLONE_FILES) should only be needed if the calling
task is multi-threaded and shares the file descriptor table with another
thread in which case two threads could race with one thread allocating file
descriptors and the other one closing them via close_range(). For the
general case close_range() before the execve() is sufficient.)

Second, it allows userspace to avoid implementing closing all file
descriptors by parsing through /proc/<pid>/fd/* and calling close() on each
file descriptor. From looking at various large(ish) userspace code bases
this or similar patterns are very common in:
- service managers (cf. [4])
- libcs (cf. [6])
- container runtimes (cf. [5])
- programming language runtimes/standard libraries
  - Python (cf. [2])
  - Rust (cf. [7], [8])
As Dmitry pointed out there's even a long-standing glibc bug about missing
kernel support for this task (cf. [3]).
In addition, the syscall will also work for tasks that do not have procfs
mounted and on kernels that do not have procfs support compiled in. In such
situations the only way to make sure that all file descriptors are closed
is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE,
OPEN_MAX trickery (cf. comment [8] on Rust).

The performance is striking. For good measure, comparing the following
simple close_all_fds() userspace implementation that is essentially just
glibc's version in [6]:

static int close_all_fds(void)
{
        int dir_fd;
        DIR *dir;
        struct dirent *direntp;

        dir = opendir("/proc/self/fd");
        if (!dir)
                return -1;
        dir_fd = dirfd(dir);
        while ((direntp = readdir(dir))) {
                int fd;
                if (strcmp(direntp->d_name, ".") == 0)
                        continue;
                if (strcmp(direntp->d_name, "..") == 0)
                        continue;
                fd = atoi(direntp->d_name);
                if (fd == dir_fd || fd == 0 || fd == 1 || fd == 2)
                        continue;
                close(fd);
        }
        closedir(dir);
        return 0;
}

to close_range() yields:
1. closing 4 open files:
   - close_all_fds(): ~280 us
   - close_range():    ~24 us

2. closing 1000 open files:
   - close_all_fds(): ~5000 us
   - close_range():   ~800 us

close_range() is designed to allow for some flexibility. Specifically, it
does not simply always close all open file descriptors of a task. Instead,
callers can specify an upper bound.
This is e.g. useful for scenarios where specific file descriptors are
created with well-known numbers that are supposed to be excluded from
getting closed.
For extra paranoia close_range() comes with a flags argument. This can e.g.
be used to implement extension. Once can imagine userspace wanting to stop
at the first error instead of ignoring errors under certain circumstances.
There might be other valid ideas in the future. In any case, a flag
argument doesn't hurt and keeps us on the safe side.

From an implementation side this is kept rather dumb. It saw some input
from David and Jann but all nonsense is obviously my own!
- Errors to close file descriptors are currently ignored. (Could be changed
  by setting a flag in the future if needed.)
- __close_range() is a rather simplistic wrapper around __close_fd().
  My reasoning behind this is based on the nature of how __close_fd() needs
  to release an fd. But maybe I misunderstood specifics:
  We take the files_lock and rcu-dereference the fdtable of the calling
  task, we find the entry in the fdtable, get the file and need to release
  files_lock before calling filp_close().
  In the meantime the fdtable might have been altered so we can't just
  retake the spinlock and keep the old rcu-reference of the fdtable
  around. Instead we need to grab a fresh reference to the fdtable.
  If my reasoning is correct then there's really no point in fancyfying
  __close_range(): We just need to rcu-dereference the fdtable of the
  calling task once to cap the max_fd value correctly and then go on
  calling __close_fd() in a loop.

/* References */
[1]: https://lore.kernel.org/lkml/20190516165021.GD17978@ZenIV.linux.org.uk/
[2]: 9e4f2f3a6b/Modules/_posixsubprocess.c (L220)
[3]: https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c7
[4]: 5238e95759/src/basic/fd-util.c (L217)
[5]: ddf4b77e11/src/lxc/start.c (L236)
[6]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/grantpt.c;h=2030e07fa6e652aac32c775b8c6e005844c3c4eb;hb=HEAD#l17
     Note that this is an internal implementation that is not exported.
     Currently, libc seems to not provide an exported version of this
     because of missing kernel support to do this.
     Note, in a recent patch series Florian made grantpt() a nop thereby
     removing the code referenced here.
[7]: https://github.com/rust-lang/rust/issues/12148
[8]: 5f47c0613e/src/libstd/sys/unix/process2.rs (L303-L308)
     Rust's solution is slightly different but is equally unperformant.
     Rust calls getdtablesize() which is a glibc library function that
     simply returns the current RLIMIT_NOFILE or OPEN_MAX values. Rust then
     goes on to call close() on each fd. That's obviously overkill for most
     tasks. Rarely, tasks - especially non-demons - hit RLIMIT_NOFILE or
     OPEN_MAX.
     Let's be nice and assume an unprivileged user with RLIMIT_NOFILE set
     to 1024. Even in this case, there's a very high chance that in the
     common case Rust is calling the close() syscall 1021 times pointlessly
     if the task just has 0, 1, and 2 open.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kyle Evans <self@kyle-evans.net>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
2020-06-17 00:05:19 +02:00
Borislav Petkov
e370f886fe EDAC: Remove edac_get_dimm_by_index()
It is unused now.

Signed-off-by: Borislav Petkov <bp@suse.de>
2020-06-16 19:25:20 +02:00
Colton Lewis
2fd2bc7f49 gpu: host1x: Correct trivial kernel-doc inconsistencies
Silence documentation build warnings by adding kernel-doc fields.

./include/linux/host1x.h:69: warning: Function parameter or member 'parent' not described in 'host1x_client'
./include/linux/host1x.h:69: warning: Function parameter or member 'usecount' not described in 'host1x_client'
./include/linux/host1x.h:69: warning: Function parameter or member 'lock' not described in 'host1x_client'

Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-06-16 18:59:45 +02:00
Ilya Dryomov
22d2cfdffa libceph: move away from global osd_req_flags
osd_req_flags is overly general and doesn't suit its only user
(read_from_replica option) well:

- applying osd_req_flags in account_request() affects all OSD
  requests, including linger (i.e. watch and notify).  However,
  linger requests should always go to the primary even though
  some of them are reads (e.g. notify has side effects but it
  is a read because it doesn't result in mutation on the OSDs).

- calls to class methods that are reads are allowed to go to
  the replica, but most such calls issued for "rbd map" and/or
  exclusive lock transitions are requested to be resent to the
  primary via EAGAIN, doubling the latency.

Get rid of global osd_req_flags and set read_from_replica flag
only on specific OSD requests instead.

Fixes: 8ad44d5e0d ("libceph: read_from_replica option")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-06-16 16:01:53 +02:00
Georgi Djakov
12a400b016 interconnect: Mark all dummy functions as static inline
There are a few dummy stub functions that are not marked as static inline
yet. Currently this header file is not included in any other file outside
of drivers/interconnect/, but that might not be the case in the future.
If this file gets included and the framework is disabled, we will be see
warnings. Let's fix this in advance.

Link: https://lore.kernel.org/r/20200228145945.13579-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
2020-06-16 16:43:23 +03:00