Commit graph

1089033 commits

Author SHA1 Message Date
Jiri Olsa
d31e0386a2 bpf: Fix sparse warnings in kprobe_multi_resolve_syms
Adding missing __user tags to fix sparse warnings:

kernel/trace/bpf_trace.c:2370:34: warning: incorrect type in argument 2 (different address spaces)
kernel/trace/bpf_trace.c:2370:34:    expected void const [noderef] __user *from
kernel/trace/bpf_trace.c:2370:34:    got void const *usyms
kernel/trace/bpf_trace.c:2376:51: warning: incorrect type in argument 2 (different address spaces)
kernel/trace/bpf_trace.c:2376:51:    expected char const [noderef] __user *src
kernel/trace/bpf_trace.c:2376:51:    got char const *
kernel/trace/bpf_trace.c:2443:49: warning: incorrect type in argument 1 (different address spaces)
kernel/trace/bpf_trace.c:2443:49:    expected void const *usyms
kernel/trace/bpf_trace.c:2443:49:    got void [noderef] __user *[assigned] usyms

Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220330110510.398558-1-jolsa@kernel.org
2022-03-30 14:10:06 +02:00
Delyan Kratunov
522574fd78 bpftool: Explicit errno handling in skeletons
Andrii noticed that since f97b8b9bd6 ("bpftool: Fix a bug in subskeleton
code generation") the subskeleton code allows bpf_object__destroy_subskeleton
to overwrite the errno that subskeleton__open would return with. While this
is not currently an issue, let's make it future-proof.

This patch explicitly tracks err in subskeleton__open and skeleton__create
(i.e. calloc failure is explicitly ENOMEM) and ensures that errno is -err on
the error return path. The skeleton code had to be changed since maps and
progs codegen is shared with subskeletons.

Fixes: f97b8b9bd6 ("bpftool: Fix a bug in subskeleton code generation")
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/3b6bfbb770c79ae64d8de26c1c1bd9d53a4b85f8.camel@fb.com
2022-03-30 14:06:59 +02:00
Takashi Iwai
21b5954d61 ASoC: Fixes for v5.18
A few fixes that came in during the merge window, all fairly routine.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmJEQ4YACgkQJNaLcl1U
 h9DaQAf/VgSwgkvvN9ch08wPBHxZUvZ4JJ+mMGPUy1Qq5bKxZn6BPD8PdxnQY2Ni
 LwFS3sAdVcZPnwIGt6HhmEtTN/fB85mIM8gF6XLlYurcPzY1mpwM8F52IwxVKPDt
 DCOMEmC95NHf+Cw+ukH0SjGELF7xVe7GVdrDrnEqZUuEsu0V5pKsvSAuKJEvgXnU
 YcYEOB+KDhQUnGVoYZJ38yQ+PLKn9XYz8XHddksqc1n89h8gk9pkj0YJDIe2ePnb
 8vxycHmjj+z0CgAy8v9uQDMkjpnFyzvtZ3YQtZccekZiTKnPY8LhUKnAly/MIfGJ
 Usxuo4CA3FDNQLwzNFoIYFNq2HdKdA==
 =DQFJ
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v5.18' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.18

A few fixes that came in during the merge window, all fairly routine.
2022-03-30 14:04:22 +02:00
Thomas Gleixner
d6d6d50f1e x86/fpu/xstate: Consolidate size calculations
Use the offset calculation to do the size calculation which avoids yet
another series of CPUID instructions for each invocation.

  [ Fix the FP/SSE only case which missed to take the xstate
    header into account, as
    Reported-by: kernel test robot <oliver.sang@intel.com> ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/87o81pgbp2.ffs@tglx
2022-03-30 12:04:09 +02:00
Thomas Gleixner
781c64bfcb x86/fpu/xstate: Handle supervisor states in XSTATE permissions
The size calculation in __xstate_request_perm() fails to take supervisor
states into account because the permission bitmap is only relevant for user
states.

Up to 5.17 this does not matter because there are no supervisor states
supported, but the (re-)enabling of ENQCMD makes them available.

Fixes: 7c1ef59145 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.681768598@linutronix.de
2022-03-30 11:35:22 +02:00
Thomas Gleixner
7aa5128b5f x86/fpu/xsave: Handle compacted offsets correctly with supervisor states
So far the cached fixed compacted offsets worked, but with (re-)enabling
of ENQCMD this does no longer work with KVM fpstate.

KVM does not have supervisor features enabled for the guest FPU, which
means that KVM has then a different XSAVE area layout than the host FPU
state. This in turn breaks the copy from/to UABI functions when invoked for
a guest state.

Remove the pre-calculated compacted offsets and calculate the offset
of each component at runtime based on the XCOMP_BV field in the XSAVE
header.

The runtime overhead is not interesting because these copy from/to UABI
functions are not used in critical fast paths. KVM uses them to save and
restore FPU state during migration. The host uses them for ptrace and for
the slow path of 32bit signal handling.

Fixes: 7c1ef59145 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.627636809@linutronix.de
2022-03-30 11:20:36 +02:00
Thomas Gleixner
6afbb58cc2 x86/fpu: Cache xfeature flags from CPUID
In preparation for runtime calculation of XSAVE offsets cache the feature
flags for each XSTATE component during feature enumeration via CPUID(0xD).

EDX has two relevant bits:
    0	Supervisor component
    1	Feature storage must be 64 byte aligned

These bits are currently only evaluated during init, but the alignment bit
must be cached to make runtime calculation of XSAVE offsets efficient.

Cache the full EDX content and use it for the existing alignment and
supervisor checks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.573656209@linutronix.de
2022-03-30 11:05:32 +02:00
Thomas Gleixner
35a77d4503 x86/fpu/xsave: Initialize offset/size cache early
Reading XSTATE feature information from CPUID over and over does not make
sense. The information has to be cached anyway, so it can be done early.

Prepare for runtime calculation of XSTATE offsets and allow
consolidation of the size calculation functions in a later step.

Rename the function while at it as it does not setup any features.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.519411939@linutronix.de
2022-03-30 10:55:44 +02:00
Thomas Gleixner
d47f71f6de x86/fpu: Remove unused supervisor only offsets
No users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.465066249@linutronix.de
2022-03-30 10:44:51 +02:00
Mohan Kumar
6ddc2f7496 ALSA: hda: Avoid unsol event during RPM suspending
There is a corner case with unsol event handling during codec runtime
suspending state. When the codec runtime suspend call initiated, the
codec->in_pm atomic variable would be 0, currently the codec runtime
suspend function calls snd_hdac_enter_pm() which will just increments
the codec->in_pm atomic variable. Consider unsol event happened just
after this step and before snd_hdac_leave_pm() in the codec runtime
suspend function. The snd_hdac_power_up_pm() in the unsol event
flow in hdmi_present_sense_via_verbs() function would just increment
the codec->in_pm atomic variable without calling pm_runtime_get_sync
function.

As codec runtime suspend flow is already in progress and in parallel
unsol event is also accessing the codec verbs, as soon as codec
suspend flow completes and clocks are  switched off before completing
the unsol event handling as both functions doesn't wait for each other.
This will result in below errors

[  589.428020] tegra-hda 3510000.hda: azx_get_response timeout, switching
to polling mode: last cmd=0x505f2f57
[  589.428344] tegra-hda 3510000.hda: spurious response 0x80000074:0x5,
last cmd=0x505f2f57
[  589.428547] tegra-hda 3510000.hda: spurious response 0x80000065:0x5,
last cmd=0x505f2f57

To avoid this, the unsol event flow should not perform any codec verb
related operations during RPM_SUSPENDING state.

Signed-off-by: Mohan Kumar <mkumard@nvidia.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220329155940.26331-1-mkumard@nvidia.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-03-30 10:20:27 +02:00
Jason Wang
1c80cf031e vdpa: mlx5: synchronize driver status with CVQ
Currently, CVQ doesn't have any synchronization with the driver
status. Then CVQ emulation code run in the middle of:

1) device reset
2) device status changed
3) map updating

The will lead several unexpected issue like trying to execute CVQ
command after the driver has been teared down.

Fixing this by using reslock to synchronize CVQ emulation code with
the driver status changing:

- protect the whole device reset, status changing and set_map()
  updating with reslock
- protect the CVQ handler with the reslock and check
  VIRTIO_CONFIG_S_DRIVER_OK in the CVQ handler

This will guarantee that:

1) CVQ handler won't work if VIRTIO_CONFIG_S_DRIVER_OK is not set
2) CVQ handler will see a consistent state of the driver instead of
   the partial one when it is running in the middle of the
   teardown_driver() or setup_driver().

Cc: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220329042109.4029-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-30 04:18:14 -04:00
Jason Wang
55ebf0d60e vdpa: mlx5: prevent cvq work from hogging CPU
A userspace triggerable infinite loop could happen in
mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of
cvq requests.

Fixing this by introducing a quota and re-queue the work if we're out
of the budget (currently the implicit budget is one) . While at it,
using a per device work struct to avoid on demand memory allocation
for cvq.

Fixes: 5262912ef3 ("vdpa/mlx5: Add support for control VQ and MAC setting")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220329042109.4029-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-30 04:18:14 -04:00
Michael S. Tsirkin
c18c86808b Revert "virtio_config: introduce a new .enable_cbs method"
This reverts commit d50497eb4e.

The new callback ended up not being used, and it's asymmetrical:
just enable, no disable.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-30 04:18:14 -04:00
Michael S. Tsirkin
7414539c5f Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
This reverts commit 8d65bc9a5b.

We reverted the problematic changes, no more need for work
arounds on restore.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-30 04:18:14 -04:00
Kai-Heng Feng
f30741cded ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
Commit 5aec989130 ("ALSA: hda/realtek - ALC236 headset MIC recording
issue") is to solve recording issue met on AL236, by matching codec
variant ALC269_TYPE_ALC257 and ALC269_TYPE_ALC256.

This match can be too broad and Mi Notebook Pro 2020 is broken by the
patch.

Instead, use codec ID to be narrow down the scope, in order to make
ALC256 unaffected.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215484
Fixes: 5aec989130 ("ALSA: hda/realtek - ALC236 headset MIC recording issue")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20220330061335.1015533-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-03-30 10:17:35 +02:00
Paul Kocialkowski
67bae5f28c
drm: of: Properly try all possible cases for bridge/panel detection
While bridge/panel detection was initially relying on the usual
port/ports-based of graph detection, it was recently changed to
perform the lookup on any child node that is not port/ports
instead when such a node is available, with no fallback on the
usual way.

This results in breaking detection when a child node is present
but does not contain any panel or bridge node, even when the
usual port/ports-based of graph is there.

In order to support both situations properly, this commit reworks
the logic to try both options and not just one of the two: it will
only return -EPROBE_DEFER when both have failed.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Fixes: 80253168db ("drm: of: Lookup if child node has panel or bridge")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220329132732.628474-1-paul.kocialkowski@bootlin.com
2022-03-30 10:16:05 +02:00
Linus Torvalds
d888c83fce fs: fix fd table size alignment properly
Jason Donenfeld reports that my commit 1c24a18639 ("fs: fd tables have
to be multiples of BITS_PER_LONG") doesn't work, and the reason is an
embarrassing brown-paper-bag bug.

Yes, we want to align the number of fds to BITS_PER_LONG, and yes, the
reason they might not be aligned is because the incoming 'max_fd'
argument might not be aligned.

But aligining the argument - while simple - will cause a "infinitely
big" maxfd (eg NR_OPEN_MAX) to just overflow to zero.  Which most
definitely isn't what we want either.

The obvious fix was always just to do the alignment last, but I had
moved it earlier just to make the patch smaller and the code look
simpler.  Duh.  It certainly made _me_ look simple.

Fixes: 1c24a18639 ("fs: fd tables have to be multiples of BITS_PER_LONG")
Reported-and-tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Fedor Pchelkin <aissur0002@gmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-29 23:29:18 -07:00
Christophe JAILLET
7968778914 PCI: Remove the deprecated "pci-dma-compat.h" API
Now that all usages of the functions defined in "pci-dma-compat.h" have
been removed, it is time to remove this file as well.

In order not to break builds, move the "#include <linux/dma-mapping.h>"
that was in "pci-dma-compat.h" into "include/linux/pci.h"

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-30 07:22:13 +02:00
Peter Zijlstra
aa8e73eed7 crypto: x86/sm3 - Fixup SLS
This missed the big asm update due to being merged through the crypto
tree.

Fixes: f94909ceb1 ("x86: Prepare asm files for straight-line-speculation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-30 16:33:11 +12:00
Colin Ian King
a6b758b042 scsi: bnx2i: Fix spelling mistake "mis-match" -> "mismatch"
There are a few spelling mistakes in some error messages. Fix them.

Link: https://lore.kernel.org/r/20220319231445.21696-1-colin.i.king@gmail.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-30 00:07:25 -04:00
Colin Ian King
7ff897b2a5 scsi: bnx2fc: Fix spelling mistake "mis-match" -> "mismatch"
There are a few spelling mistakes in some error messages. Fix them.

Link: https://lore.kernel.org/r/20220319231122.21476-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-30 00:06:44 -04:00
Christophe JAILLET
16ed828b87 scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
The error handling path of the probe releases a resource that is not freed
in the remove function. In some cases, a ioremap() must be undone.

Add the missing iounmap() call in the remove function.

Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.1647673264.git.christophe.jaillet@wanadoo.fr
Fixes: 45804fbb00 ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI")
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-30 00:05:42 -04:00
Tom Rix
37a9bd7090 scsi: aic7xxx: Use standard PCI subsystem, subdevice defines
Common defines should be used over custom defines.

Change and remove these defines:

 - PCIR_SUBVEND_0 to PCI_SUBSYSTEM_VENDOR_ID

 - PCIR_SUBDEV_0 to PCI_SUBSYSTEM_ID

Link: https://lore.kernel.org/r/20220322144648.2467777-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-30 00:00:59 -04:00
Krzysztof Kozlowski
5ca0faf9c2 scsi: ufs: qcom: Drop custom Android boot parameters
The QCOM UFS driver requires an androidboot.bootdevice command line
argument matching the UFS device name.  If the name is different, it
refuses to probe.  This androidboot.bootdevice is provided by stock/vendor
(from an Android-based device) bootloader.

This does not make sense from Linux point of view.  Driver should be able
to boot regardless of bootloader.  Driver should not depend on some Android
custom environment data.

Link: https://lore.kernel.org/r/20220321151853.24138-1-krzk@kernel.org
Tested-by: Amit Pundir <amit.pundir@linaro.org>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:59:12 -04:00
Jackie Liu
99241e119f scsi: core: sysfs: Remove comments that conflict with the actual logic
Christoph Hellwig Says:
=======================
I think we should just handle the error properly and remove the comment.
There's no good reason to ignore bsg registration errors.

In fact, after commit 92c4b58b15 ("scsi: core: Register sysfs attributes
earlier"), we are already forced to return errno.

We discuss this issue in [1].

[1] https://lore.kernel.org/all/20211022010201.426746-1-liu.yun@linux.dev/

Link: https://lore.kernel.org/r/20220329021251.123805-1-liu.yun@linux.dev
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:54:26 -04:00
Dan Carpenter
066f4c3194 scsi: hisi_sas: Remove stray fallthrough annotation
This case statement doesn't fall through any more so remove the fallthrough
annotation.

Link: https://lore.kernel.org/r/20220317075214.GC25237@kili
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:53:03 -04:00
Randy Dunlap
41b8c2a314 scsi: virtio-scsi: Eliminate anonymous module_init & module_exit
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs, or an
initcall_debug log.

Give each of these init and exit functions unique driver-specific names to
eliminate the anonymous names.

Example 1: (System.map)
 ffffffff832fc78c t init
 ffffffff832fc79e t init
 ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
 calling  init+0x0/0x12 @ 1
 initcall init+0x0/0x12 returned 0 after 15 usecs
 calling  init+0x0/0x60 @ 1
 initcall init+0x0/0x60 returned 0 after 2 usecs
 calling  init+0x0/0x9a @ 1
 initcall init+0x0/0x9a returned 0 after 74 usecs

Link: https://lore.kernel.org/r/20220316192010.19001-6-rdunlap@infradead.org
Fixes: 4fe74b1cb0 ("[SCSI] virtio-scsi: SCSI driver for QEMU based virtual machines")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:49:56 -04:00
Colin Ian King
fac952bb54 scsi: isci: Fix spelling mistake "doesnt" -> "doesn't"
There are a few spelling mistakes in dev_warn and dev_err messages.  Fix
these.

Link: https://lore.kernel.org/r/20220316235615.56683-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:46:23 -04:00
John Garry
eaba83b5b8 scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map()
In commit edb854a368 ("scsi: core: Reallocate device's budget map on
queue depth change"), the sbitmap for the device budget map may be
reallocated after the slave device depth is configured.

When the sbitmap is reallocated we use the result from
scsi_device_max_queue_depth() for the sbitmap size, but don't resize to
match the actual device queue depth.

Fix by resizing the sbitmap after reallocating the budget sbitmap. We do
this instead of init'ing the sbitmap to the device queue depth as the user
may want to change the queue depth later via sysfs or other.

Link: https://lore.kernel.org/r/1647423870-143867-1-git-send-email-john.garry@huawei.com
Fixes: edb854a368 ("scsi: core: Reallocate device's budget map on queue depth change")
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:45:01 -04:00
Finn Thain
63221571ef scsi: aha152x: Stop using struct scsi_pointer
Remove aha152x_cmd_priv.scsi_pointer by moving the necessary members into
aha152x_cmd_priv proper.

Tested with an Adaptec SlimSCSI APA-1460A card.

Link: https://lore.kernel.org/r/bdc1264b6dd331150bffb737958cab8c9c068fa1.1648070977.git.fthain@linux-m68k.org
Cc: Christoph Hellwig <hch@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:42:18 -04:00
Tyrel Datwyler
0bade8e532 scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
The adapter request_limit is hardcoded to be INITIAL_SRP_LIMIT which is
currently an arbitrary value of 800. Increase this value to 1024 which
better matches the characteristics of the typical IBMi Initiator that
supports 32 LUNs and a queue depth of 32.

This change also has the secondary benefit of being a power of two as
required by the kfifo API. Since, Commit ab9bb6318b ("Partially revert
"kfifo: fix kfifo_alloc() and kfifo_init()"") the size of IU pool for each
target has been rounded down to 512 when attempting to kfifo_init() those
pools with the current request_limit size of 800.

Link: https://lore.kernel.org/r/20220322194443.678433-1-tyreld@linux.ibm.com
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:39:24 -04:00
Kevin Groeneveld
bc5519c18a scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling
Commit 2e27f576ab ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from
scsi_ioctl()") seems to have a typo as it is checking ret instead of cmd in
the if statement checking for CDROMCLOSETRAY and CDROMEJECT.  This changes
the behaviour of these ioctls as the cdrom_ioctl handling of these is more
restrictive than the scsi_ioctl version.

Link: https://lore.kernel.org/r/20220323002242.21157-1-kgroeneveld@lenbrook.com
Fixes: 2e27f576ab ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:37:45 -04:00
Christophe JAILLET
f16aa285e6 scsi: pmcraid: Remove the PMCRAID_PASSTHROUGH_IOCTL ioctl implementation
The whole passthrough ioctl path looks completely broken. For example it
DMA maps the scatterlist and after that copies data to it, which is
prohibited by the DMA API contract.

Moreover, in pmcraid_alloc_sglist(), the pointer returned by a
sgl_alloc_order() call is not recorded anywhere which is pointless.

So remove the PMCRAID_PASSTHROUGH_IOCTL ioctl implementation entirely.
Should it be needed, we should reimplement it using the proper block layer
request mapping helpers.

Link: https://lore.kernel.org/r/7f27a70bec3f3dcaf46a29b1c630edd4792e71c0.1648298857.git.christophe.jaillet@wanadoo.fr
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:32:26 -04:00
Tomas Henzl
f06aa52cb2 scsi: core: scsi_logging: Fix a BUG
The request_queue may be NULL in a request, for example when it comes from
scsi_ioctl_reset(). Check it before use.

Fixes: f3fa33acca ("block: remove the ->rq_disk field in struct request")
Link: https://lore.kernel.org/r/20220324134603.28463-1-thenzl@redhat.com
Reported-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:29:19 -04:00
Keoseong Park
8ee15ea779 scsi: ufs: core: Remove unused field in struct ufs_hba
Remove unused fields 'rpm_lvl_attr' and 'spm_lvl_attr' in struct ufs_hba.
Commit cbb6813ee7 ("scsi: ufs: sysfs: attribute group for existing sysfs
entries.") removed all code using that field.

Link: https://lore.kernel.org/r/413601558.101648105683746.JavaMail.epsvc@epcpadp4
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:27:13 -04:00
James Smart
c26bd6602e scsi: lpfc: Fix locking for lpfc_sli_iocbq_lookup()
The rules changed for lpfc_sli_iocbq_lookup() vs locking. Prior, the
routine properly took out the lock. In newly refactored code, the locks
must be held when calling the routine.

Fix lpfc_sli_process_sol_iocb() to take the locks before calling the
routine.

Fix lpfc_sli_handle_fast_ring_event() to not release the locks to call the
routine.

Link: https://lore.kernel.org/r/20220323205545.81814-3-jsmart2021@gmail.com
Fixes: 1b64aa9eae ("scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4")
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:22:16 -04:00
James Smart
7294a9bcaa scsi: lpfc: Fix broken SLI4 abort path
There was a merge error in ther 14.2.0.0 patches that resulted in the SLI4
path using the SLI3 issue_abort_iotag() routine. This resulted in txcmplq
corruption.

Fix to use the SLI4 routine when SLI4.

Link: https://lore.kernel.org/r/20220323205545.81814-2-jsmart2021@gmail.com
Fixes: 31a59f7570 ("scsi: lpfc: SLI path split: Refactor Abort paths")
Cc: <stable@vger.kernel.org> # v5.2+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:22:15 -04:00
James Smart
4f3beb36b1 scsi: lpfc: Update lpfc version to 14.2.0.1
Update lpfc version to 14.2.0.1

Link: https://lore.kernel.org/r/20220317032737.45308-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:19:38 -04:00
James Smart
df0101197c scsi: lpfc: Fix queue failures when recovering from PCI parity error
When recovering from a pci-parity error the driver is failing to re-create
queues, causing recovery to fail. Looking deeper, it was found that the
interrupt vector count allocated on the recovery was fewer than the vectors
originally allocated. This disparity resulted in CPU map entries with stale
information. When the driver tries to re-create the queues, it attempts to
use the stale information which indicates an eq/interrupt vector that was
no longer created.

Fix by clearng the cpup map array before enabling and requesting the IRQs
in the lpfc_sli_reset_slot_s4 routine().

Link: https://lore.kernel.org/r/20220317032737.45308-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:19:38 -04:00
James Smart
a4691038b4 scsi: lpfc: Fix unload hang after back to back PCI EEH faults
When injecting EEH errors the port is getting hung up waiting on the node
list to empty, message number 0233. The driver is stuck at this point and
also can't unload. The driver makes transport remoteport delete calls which
try to abort I/O's, but the EEH daemon has already called the driver to
detach and the detachment has set the global FC_UNLOADING flag.  There are
several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is
set, resulting in transports waiting for I/O while the driver is waiting on
transports to clean up.

Additionally, during study of the list, a locking issue was found in
lpfc_sli_abort_iocb_ring that could corrupt the list.

A special case was added to the lpfc_cleanup() routine to call
lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot
is offline (e.g. EEH).

The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the
ring_lock.  Also added code to cancel the I/Os if the pci-slot is offline
and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
to prevent trying to send an I/O that we cannot handle.

Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:19:38 -04:00
James Smart
35ed9613d8 scsi: lpfc: Improve PCI EEH Error and Recovery Handling
Following EEH errors, the driver can crash or hang when deleting the
localport or when attempting to unload.

The EEH handlers in the driver did not notify the NVMe-FC transport before
tearing the driver down. This was delayed until the resume steps. This
worked for SCSI because lpfc_block_scsi() would notify the
scsi_fc_transport that the target was not available but it would not clean
up all the references to the ndlp.

The SLI3 prep for dev reset handler did the lpfc_offline_prep() and
lpfc_offline() calls to get the port stopped before restarting. The SLI4
version of the prep for dev reset just destroyed the queues and did not
stop NVMe from continuing.  Also because the port was not really stopped
the localport destroy would hang because the transport was still waiting
for I/O. Additionally, a devloss tmo can fire and post events to a stopped
worker thread creating another hang condition.

lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and
lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI
and NVMe transports are notified to block I/O to the driver.

Logic is added to devloss handler and worker thread to clean up ndlp
references and quiesce appropriately.

Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:19:37 -04:00
Xiaoguang Wang
a6968f7a36 scsi: target: tcmu: Fix possible page UAF
tcmu_try_get_data_page() looks up pages under cmdr_lock, but it does not
take refcount properly and just returns page pointer. When
tcmu_try_get_data_page() returns, the returned page may have been freed by
tcmu_blocks_release().

We need to get_page() under cmdr_lock to avoid concurrent
tcmu_blocks_release().

Link: https://lore.kernel.org/r/20220311132206.24515-1-xiaoguang.wang@linux.alibaba.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:07:56 -04:00
Wenchao Hao
ebfe3e0c5e scsi: libiscsi: Remove unnecessary memset() in iscsi_conn_setup()
iscsi_cls_conn is alloced by kzalloc(), the whole iscsi_cls_conn is zero
filled already including the dd_data. So it is unnecessary to call memset
again.

Link: https://lore.kernel.org/r/20220317150116.194140-1-haowenchao@huawei.com
Reviewed-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 23:01:28 -04:00
Damien Le Moal
87d663d408 scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove()
The function mpt3sas_transport_port_remove() called in
_scsih_expander_node_remove() frees the port field of the sas_expander
structure, leading to the following use-after-free splat from KASAN when
the ioc_info() call following that function is executed (e.g. when doing
rmmod of the driver module):

[ 3479.371167] ==================================================================
[ 3479.378496] BUG: KASAN: use-after-free in _scsih_expander_node_remove+0x710/0x750 [mpt3sas]
[ 3479.386936] Read of size 1 at addr ffff8881c037691c by task rmmod/1531
[ 3479.393524]
[ 3479.395035] CPU: 18 PID: 1531 Comm: rmmod Not tainted 5.17.0-rc8+ #1436
[ 3479.401712] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.1 06/02/2021
[ 3479.409263] Call Trace:
[ 3479.411743]  <TASK>
[ 3479.413875]  dump_stack_lvl+0x45/0x59
[ 3479.417582]  print_address_description.constprop.0+0x1f/0x120
[ 3479.423389]  ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas]
[ 3479.429469]  kasan_report.cold+0x83/0xdf
[ 3479.433438]  ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas]
[ 3479.439514]  _scsih_expander_node_remove+0x710/0x750 [mpt3sas]
[ 3479.445411]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 3479.452032]  scsih_remove+0x525/0xc90 [mpt3sas]
[ 3479.458212]  ? mpt3sas_expander_remove+0x1d0/0x1d0 [mpt3sas]
[ 3479.465529]  ? down_write+0xde/0x150
[ 3479.470746]  ? up_write+0x14d/0x460
[ 3479.475840]  ? kernfs_find_ns+0x137/0x310
[ 3479.481438]  pci_device_remove+0x65/0x110
[ 3479.487013]  __device_release_driver+0x316/0x680
[ 3479.493180]  driver_detach+0x1ec/0x2d0
[ 3479.498499]  bus_remove_driver+0xe7/0x2d0
[ 3479.504081]  pci_unregister_driver+0x26/0x250
[ 3479.510033]  _mpt3sas_exit+0x2b/0x6cf [mpt3sas]
[ 3479.516144]  __x64_sys_delete_module+0x2fd/0x510
[ 3479.522315]  ? free_module+0xaa0/0xaa0
[ 3479.527593]  ? __cond_resched+0x1c/0x90
[ 3479.532951]  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[ 3479.539607]  ? syscall_enter_from_user_mode+0x21/0x70
[ 3479.546161]  ? trace_hardirqs_on+0x1c/0x110
[ 3479.551828]  do_syscall_64+0x35/0x80
[ 3479.556884]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3479.563402] RIP: 0033:0x7f1fc482483b
...
[ 3479.943087] ==================================================================

Fix this by introducing the local variable port_id to store the port ID
value before executing mpt3sas_transport_port_remove(). This local variable
is then used in the call to ioc_info() instead of dereferencing the freed
port structure.

Link: https://lore.kernel.org/r/20220322055702.95276-1-damien.lemoal@opensource.wdc.com
Fixes: 7d310f2410 ("scsi: mpt3sas: Get device objects using sas_address & portID")
Cc: stable@vger.kernel.org
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-29 22:53:24 -04:00
NeilBrown
eb07d5a4da SUNRPC: handle malloc failure in ->request_prepare
If ->request_prepare() detects an error, it sets ->rq_task->tk_status.
This is easy for callers to ignore.
The only caller is xprt_request_enqueue_receive() and it does ignore the
error, as does call_encode() which calls it.  This can result in a
request being queued to receive a reply without an allocated receive buffer.

So instead of setting rq_task->tk_status, return an error, and store in
->tk_status only in call_encode();

The call to xprt_request_enqueue_receive() is now earlier in
call_encode(), where the error can still be handled.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:15:46 -04:00
ChenXiaoSong
b243874f6f NFSv4: fix open failure with O_ACCMODE flag
open() with O_ACCMODE|O_DIRECT flags secondly will fail.

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)

Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
client use incorrect share access mode of 0.

Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
just like firstly opening.

Fixes: ce4ef7c0a8 ("NFS: Split out NFS v4 file operations")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:14:00 -04:00
ChenXiaoSong
ab0fc21bc7 Revert "NFSv4: Handle the special Linux file open access mode"
This reverts commit 44942b4e45.

After secondly opening a file with O_ACCMODE|O_DIRECT flags,
nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
  5. lseek(fd)

Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:14:00 -04:00
Jakub Kicinski
77c9387c0c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2022-03-29

We've added 16 non-merge commits during the last 1 day(s) which contain
a total of 24 files changed, 354 insertions(+), 187 deletions(-).

The main changes are:

1) x86 specific bits of fprobe/rethook, from Masami and Peter.

2) ice/xsk fixes, from Maciej and Magnus.

3) Various small fixes, from Andrii, Yonghong, Geliang and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix clang compilation errors
  ice: xsk: Fix indexing in ice_tx_xsk_pool()
  ice: xsk: Stop Rx processing when ntc catches ntu
  ice: xsk: Eliminate unnecessary loop iteration
  xsk: Do not write NULL in SW ring at allocation failure
  x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
  x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
  x86,rethook,kprobes: Replace kretprobe with rethook on x86
  kprobes: Use rethook for kretprobe if possible
  bpftool: Fix generated code in codegen_asserts
  selftests/bpf: fix selftest after random: Urandom_read tracepoint removal
  bpf: Fix maximum permitted number of arguments check
  bpf: Sync comments for bpf_get_stack
  fprobe: Fix sparse warning for acccessing __rcu ftrace_hash
  fprobe: Fix smatch type mismatch warning
  bpf/bpftool: Add unprivileged_bpf_disabled check against value of 2
====================

Link: https://lore.kernel.org/r/20220329234924.39053-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-29 18:59:15 -07:00
Linus Torvalds
965181d7ef NFS client updates for Linux 5.18
Highlights include:
 
 Features:
 - Switch NFS to use readahead instead of the obsolete readpages.
 - Readdir fixes to improve cacheability of large directories when there
   are multiple readers and writers.
 - Readdir performance improvements when doing a seekdir() immediately
   after opening the directory (common when re-exporting NFS).
 - NFS swap improvements from Neil Brown.
 - Loosen up memory allocation to permit direct reclaim and write back
   in cases where there is no danger of deadlocking the writeback code or
   NFS swap.
 - Avoid sillyrename when the NFSv4 server claims to support the
   necessary features to recover the unlinked but open file after reboot.
 
 Bugfixes:
 - Patch from Olga to add a mount option to control NFSv4.1 session
   trunking discovery, and default it to being off.
 - Fix a lockup in nfs_do_recoalesce().
 - Two fixes for list iterator variables being used when pointing to the
   list head.
 - Fix a kernel memory scribble when reading from a non-socket transport
   in /sys/kernel/sunrpc.
 - Fix a race where reconnecting to a server could leave the TCP socket
   stuck forever in the connecting state.
 - Patch from Neil to fix a shutdown race which can leave the SUNRPC
   transport timer primed after we free the struct xprt itself.
 - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy
   offload.
 - Sunrpc patch from Olga to avoid resending a task on an offlined
   transport.
 
 Cleanups:
 - Patches from Dave Wysochanski to clean up the fscache code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJDLLUACgkQZwvnipYK
 APJZ5Q/9FH5S38RcUOxpFS+XaVdmxltJTMQv9Bz5mcyljq6PwRjTn73Qs7MfBQfD
 k90O9Pn/EXgQpFWNW4o/saQGLAO1MbVaesdNNF92Le9NziMz7lr95fsFXOqpg9EQ
 rEVOImWinv/N0+P0yv5tgjx6Fheu70L4dnwsxepnmtv2R1VjXOVchK7WjsV+9qey
 Xvn15OzhtVCGhRvvKfmGuGW+2I0D29tRNn2X0Pi+LjxAEQ0We/rcsJeGDMGYtbeC
 2U4ADk7KB0kJq04WYCH8k8tTrljL3PUONzO6QP1lltmAZ0FOwXf/AjyaD3SML2Xu
 A1yc2bEn3/S6wCQ3BiMxoE7UCSewg/ZyQD7B9GjZ/nWYNXREhLKvtVZt+ULTCA6o
 UKAI2vlEHljzQI2YfbVVZFyL8KCZqi+BFcJ/H/dtFKNazIkiefkaF+703Pflazis
 HjhA5G6IT0Pk6qZ6OBp+Z/4kCDYZYXXi9ysiNuMPWCiYEfZzt7ocUjm44uRAi+vR
 Qpf0V01Rg4iplTj3YFXqb9TwxrudzC7AcQ7hbLJia0kD/8djG+aGZs3SJWeO+VuE
 Xa41oxAx9jqOSRvSKPvj7nbexrwn8PtQI2NtDRqjPXlg6+Jbk23u8lIEVhsCH877
 xoMUHtTKLyqECskzRN4nh7qoBD3QfiLuI74aDcPtGEuXzU1+Ox8=
 =jeGt
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Switch NFS to use readahead instead of the obsolete readpages.

   - Readdir fixes to improve cacheability of large directories when
     there are multiple readers and writers.

   - Readdir performance improvements when doing a seekdir() immediately
     after opening the directory (common when re-exporting NFS).

   - NFS swap improvements from Neil Brown.

   - Loosen up memory allocation to permit direct reclaim and write back
     in cases where there is no danger of deadlocking the writeback code
     or NFS swap.

   - Avoid sillyrename when the NFSv4 server claims to support the
     necessary features to recover the unlinked but open file after
     reboot.

  Bugfixes:

   - Patch from Olga to add a mount option to control NFSv4.1 session
     trunking discovery, and default it to being off.

   - Fix a lockup in nfs_do_recoalesce().

   - Two fixes for list iterator variables being used when pointing to
     the list head.

   - Fix a kernel memory scribble when reading from a non-socket
     transport in /sys/kernel/sunrpc.

   - Fix a race where reconnecting to a server could leave the TCP
     socket stuck forever in the connecting state.

   - Patch from Neil to fix a shutdown race which can leave the SUNRPC
     transport timer primed after we free the struct xprt itself.

   - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2
     copy offload.

   - Sunrpc patch from Olga to avoid resending a task on an offlined
     transport.

  Cleanups:

   - Patches from Dave Wysochanski to clean up the fscache code"

* tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits)
  NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
  NFS: Don't loop forever in nfs_do_recoalesce()
  SUNRPC: Don't return error values in sysfs read of closed files
  SUNRPC: Do not dereference non-socket transports in sysfs
  NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
  SUNRPC don't resend a task on an offlined transport
  NFS: replace usage of found with dedicated list iterator variable
  SUNRPC: avoid race between mod_timer() and del_timer_sync()
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFS: nfsiod should not block forever in mempool_alloc()
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  SUNRPC: Fix unx_lookup_cred() allocation
  NFS: Fix memory allocation in rpc_alloc_task()
  NFS: Fix memory allocation in rpc_malloc()
  SUNRPC: Improve accuracy of socket ENOBUFS determination
  SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE
  SUNRPC: Fix socket waits for write buffer space
  ...
2022-03-29 18:55:37 -07:00
Dave Chinner
919edbadeb xfs: drop async cache flushes from CIL commits.
Jan Kara reported a performance regression in dbench that he
bisected down to commit bad77c375e ("xfs: CIL checkpoint
flushes caches unconditionally").

Whilst developing the journal flush/fua optimisations this cache was
part of, it appeared to made a significant difference to
performance. However, now that this patchset has settled and all the
correctness issues fixed, there does not appear to be any
significant performance benefit to asynchronous cache flushes.

In fact, the opposite is true on some storage types and workloads,
where additional cache flushes that can occur from fsync heavy
workloads have measurable and significant impact on overall
throughput.

Local dbench testing shows little difference on dbench runs with
sync vs async cache flushes on either fast or slow SSD storage, and
no difference in streaming concurrent async transaction workloads
like fs-mark.

Fast NVME storage.

From `dbench -t 30`, CIL scale:

clients		async			sync
		BW	Latency		BW	Latency
1		 935.18   0.855		 915.64   0.903
8		2404.51   6.873		2341.77   6.511
16		3003.42   6.460		2931.57   6.529
32		3697.23   7.939		3596.28   7.894
128		7237.43  15.495		7217.74  11.588
512		5079.24  90.587		5167.08  95.822

fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize

	create		chown		unlink
async   1m41s		1m16s		2m03s
sync	1m40s		1m19s		1m54s

Slower SATA SSD storage:

From `dbench -t 30`, CIL scale:

clients		async			sync
		BW	Latency		BW	Latency
1		  78.59  15.792		  83.78  10.729
8		 367.88  92.067		 404.63  59.943
16		 564.51  72.524		 602.71  76.089
32		 831.66 105.984		 870.26 110.482
128		1659.76 102.969		1624.73  91.356
512		2135.91 223.054		2603.07 161.160

fsmark, 16 threads, create w/32k logbsize

	create		unlink
async   5m06s		4m15s
sync	5m00s		4m22s

And on Jan's test machine:

                   5.18-rc8-vanilla       5.18-rc8-patched
Amean     1        71.22 (   0.00%)       64.94 *   8.81%*
Amean     2        93.03 (   0.00%)       84.80 *   8.85%*
Amean     4       150.54 (   0.00%)      137.51 *   8.66%*
Amean     8       252.53 (   0.00%)      242.24 *   4.08%*
Amean     16      454.13 (   0.00%)      439.08 *   3.31%*
Amean     32      835.24 (   0.00%)      829.74 *   0.66%*
Amean     64     1740.59 (   0.00%)     1686.73 *   3.09%*

Performance and cache flush behaviour is restored to pre-regression
levels.

As such, we can now consider the async cache flush mechanism an
unnecessary exercise in premature optimisation and hence we can
now remove it and the infrastructure it requires completely.

Fixes: bad77c375e ("xfs: CIL checkpoint flushes caches unconditionally")
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-29 18:22:02 -07:00