Commit graph

1059881 commits

Author SHA1 Message Date
Manivannan Sadhasivam
8135cc5b27 MAINTAINERS: Update the entry for MHI bus
Since Hemant is not carrying out any maintainership duties let's make
him as a dedicated reviewer. Also add the new mailing lists dedicated
for MHI in subspace mailing list server.

Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20211019133901.173966-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-19 15:49:14 +02:00
Heiko Carstens
1a446b2473 s390: update defconfigs
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:54 +02:00
Heiko Carstens
1254cfbc5f samples: add s390 support for ftrace direct call samples
Add s390 support for ftrace direct call samples, which also enables
ftrace direct call selftests within ftrace selftests.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-5-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:54 +02:00
Heiko Carstens
c316eb4460 samples: add HAVE_SAMPLE_FTRACE_DIRECT config option
Add HAVE_SAMPLE_FTRACE_DIRECT config option which can be selected by
architectures which have support for ftrace direct call samples.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-4-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:53 +02:00
Heiko Carstens
3d487acf1b s390: make STACK_FRAME_OVERHEAD available via asm-offsets.h
Make STACK_FRAME_OVERHEAD available via asm-offsets.h. This allows to
add s390 specific asm code to e.g. ftrace samples, without requiring
to add random header files, which might cause all sort of problems on
other architectures. asm-offsets.h can be assumed to be non-problematic.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-3-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:53 +02:00
Heiko Carstens
2ab3a0a9fa s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALL support
This is the s390 variant of commit 562955fe6a ("ftrace/x86: Add
register_ftrace_direct() for custom trampolines").

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20211012133802.2460757-2-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-10-19 15:39:53 +02:00
Cai Huoqing
05be946337 net: ethernet: ixp4xx: Make use of dma_pool_zalloc() instead of dma_pool_alloc/memset()
Replacing dma_pool_alloc/memset() with dma_pool_zalloc()
to simplify the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:24:26 +01:00
Christophe JAILLET
07fab5a469 ieee802154: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

This was generated with coccinelle:

@@
expression E;
@@
- 	flush_workqueue(E);
	destroy_workqueue(E);

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:23:38 +01:00
Christoph Hellwig
97eeb5fc14 partitions/ibm: use bdev_nr_sectors instead of open coding it
Use the proper helper to read the block device size and switch various
places to pass the size in terms of sectors which is more practical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de
[axboe: fix comment typo]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 06:17:33 -06:00
Christoph Hellwig
f9831b8857 partitions/efi: use bdev_nr_bytes instead of open coding it
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019062024.2171074-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 06:16:50 -06:00
Christoph Hellwig
946e993730 block/ioctl: use bdev_nr_sectors and bdev_nr_bytes
Use the proper helper to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019062024.2171074-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 06:16:50 -06:00
Leon Romanovsky
cb3dc8901b devlink: Remove extra device_lock assert checks
PCI core code in the pci_call_probe() has a path that doesn't hold
device_lock. It happens because the ->probe() is called through the
workqueue mechanism.

   349 static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
   350                           const struct pci_device_id *id)
   351 {
   352
....
   377         if (cpu < nr_cpu_ids)
   378                 error = work_on_cpu(cpu, local_pci_probe, &ddi);

Luckily enough, the core still ensures that only single flow is executed,
so it safe to remove the assert checks that anyway were added for annotations
purposes.

Fixes: b88f7b1203 ("devlink: Annotate devlink API calls")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:16:14 +01:00
David S. Miller
04ee2752a5 linux-can-fixes-for-5.15-20211019
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmFub7gTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCpyVqK+u3vqf6yCACAiYQuk1sRh7/b6G4Qk2QUTWUxhG/T
 NWAOVuYMJkvGAMRAxjA7ecwRqA19/WIS2ys00t8UW62mvFGrGfcSBXWj3baPnx7w
 rfhyd+gk1LqOjWcaQriDmLO2ERY27roNA7FMtcMCoRlbYkrnYcFyKUMdhQXSW3/R
 /5KIEWBFzmr6ISUBUpkOeDs3irjy3ScIE0fEBt1+Uu3XydFWt7qnrMqBGlwoc9Xs
 du5VN5nWTZzdB9ju7p0PyrYdwoHsa1L1XbnIJd1hTLCua38028nuL9v7OGy3QvwN
 1Tz10PTnTk+2KMCkR4AwuzEBcHBVIcNmTrpTQAKiecB0QbXQlR9es0J5
 =9TSz
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.15-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-10-19

this is a pull request of a single patch for net/master.

The patch is by me and fixes the error handling in case of a FC
timeout in the TX path of the ISOTOP CAN protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:14:36 +01:00
Andrea Righi
480d42dc00 blk-wbt: prevent NULL pointer dereference in wb_timer_fn
The timer callback used to evaluate if the latency is exceeded can be
executed after the corresponding disk has been released, causing the
following NULL pointer dereference:

[ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ 119.987617] #PF: supervisor read access in kernel mode
[ 119.987971] #PF: error_code(0x0000) - not-present page
[ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
[ 119.988697] Oops: 0000 [#1] SMP NOPTI
[ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
[ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
[ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
[ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
[ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
[ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
[ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
[ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
[ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
[ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
[ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
[ 119.995906] Call Trace:
[ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
[ 119.996505] blk_stat_timer_fn+0x138/0x140
[ 119.996830] call_timer_fn+0x2b/0x100
[ 119.997136] __run_timers.part.0+0x1d1/0x240
[ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
[ 119.997826] ? ktime_get+0x3e/0xa0
[ 119.998110] ? native_apic_msr_write+0x2c/0x30
[ 119.998456] ? lapic_next_event+0x20/0x30
[ 119.998779] ? clockevents_program_event+0x94/0xf0
[ 119.999150] run_timer_softirq+0x2a/0x50
[ 119.999465] __do_softirq+0xcb/0x26f
[ 119.999764] irq_exit_rcu+0x8c/0xb0
[ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
[ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20

In this case simply return from the timer callback (no action
required) to prevent the NULL pointer dereference.

BugLink: https://bugs.launchpad.net/bugs/1947557
Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
Fixes: 34dbad5d26 ("blk-stat: convert to callback-based statistics reporting")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 06:13:41 -06:00
Michael Schmitz
86d46fdaa1 block: ataflop: fix breakage introduced at blk-mq refactoring
Refactoring of the Atari floppy driver when converting to blk-mq
has broken the state machine in not-so-subtle ways:

finish_fdc() must be called when operations on the floppy device
have completed. This is crucial in order to relase the ST-DMA
lock, which protects against concurrent access to the ST-DMA
controller by other drivers (some DMA related, most just related
to device register access - broken beyond compare, I know).

When rewriting the driver's old do_request() function, the fact
that finish_fdc() was called only when all queued requests had
completed appears to have been overlooked. Instead, the new
request function calls finish_fdc() immediately after the last
request has been queued. finish_fdc() executes a dummy seek after
most requests, and this overwrites the state machine's interrupt
hander that was set up to wait for completion of the read/write
request just prior. To make matters worse, finish_fdc() is called
before device interrupts are re-enabled, making certain that the
read/write interupt is missed.

Shifting the finish_fdc() call into the read/write request
completion handler ensures the driver waits for the request to
actually complete. With a queue depth of 2, we won't see long
request sequences, so calling finish_fdc() unconditionally just
adds a little overhead for the dummy seeks, and keeps the code
simple.

While we're at it, kill ataflop_commit_rqs() which does nothing
but run finish_fdc() unconditionally, again likely wiping out an
in-flight request.

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Fixes: 6ec3938cff ("ataflop: convert to blk-mq")
CC: linux-block@vger.kernel.org
CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 06:11:44 -06:00
luo penghao
3c71e0c9ab ethernet: Remove redundant statement
The variable will be assigned again later in the if condition,
there is no meaning there.

drivers/net/ethernet/broadcom/tg3.c:5750:2 warning:

Value stored to 'current_link_up' is never read.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:10:44 +01:00
Zheyu Ma
c69b2f4687 cavium: Fix return values of the probe function
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:09:57 +01:00
Zheyu Ma
e211210098 mISDN: Fix return values of the probe function
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:09:28 +01:00
Robert Hancock
92817dad7d net: phylink: Support disabling autonegotiation for PCS
The auto-negotiation state in the PCS as set by
phylink_mii_c22_pcs_config was previously always enabled when the
driver is configured for in-band autonegotiation, even if
autonegotiation was disabled on the interface with ethtool. Update the
code to set the BMCR_ANENABLE bit based on the interface's
autonegotiation enabled state.

Update phylink_mii_c22_pcs_get_state to not check
autonegotiation-related fields when autonegotiation is disabled.

Update phylink_mac_pcs_get_state to initialize the state based on the
interface's configured speed, duplex and pause parameters rather than
to unknown when autonegotiation is disabled, before calling the
driver's pcs_get_state functions, as they are not likely to provide
meaningful data for these fields when autonegotiation is disabled. In
this case the driver is really just filling in the link state field.

Note that in cases where there is a downstream PHY connected, such as
with SGMII and a copper PHY, the configuration set by ethtool is
handled by phy_ethtool_ksettings_set and not propagated to the PCS.
This is correct since SGMII or 1000Base-X autonegotiation with the PCS
should normally still be used even if the copper side has disabled it.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:07:36 +01:00
Sebastian Andrzej Siewior
e22db7bd55 net: sched: Allow statistics reads from softirq.
Eric reported that the rate estimator reads statics from the softirq
which in turn triggers a warning introduced in the statistics rework.

The warning is too cautious. The updates happen in the softirq context
so reads from softirq are fine since the writes can not be preempted.
The updates/writes happen during qdisc_run() which ensures one writer
and the softirq context.
The remaining bad context for reading statistics remains in hard-IRQ
because it may preempt a writer.

Fixes: 29cbcd8582 ("net: sched: Remove Qdisc::running sequence counter")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:07:35 +01:00
Russell King (Oracle)
dc90604b58 net: phylink: rejig SFP interface selection in ksettings_set()
Commit ea269a6f72 ("net: phylink: Update SFP selected interface on
advertising changes") added a better solution to selecting the
interface mode for SFPs using the advertisement mask. This method will
work for mvneta and mvpp2 when selecting between 2500base-X and
1000base-X without needing to use the basex helper, or indicate that
we support both 1000base-X and 2500base-X when in either of these two
interface modes.

Hence, we need to eliminate the validation prior to selecting the
interface, otherwise when we clean up mvneta's validation function, we
will end up locking to 2500base-X as we validate with an interface mode
of PHY_INERFACE_MODE_2500BASEX.

The supported mask will already have been reduced down to the union of
support for the SFP and MAC already, so we can be confident that the
advertisement mask is already appropriately restricted. We only need to
select the appropriate interface, and then revalidate with the new
interface mode.

We get rid of the check for pl->sfp_port too, this is meaningless here
as it doesn't get cleared when a module is removed, so it doesn't
indicate if a module is present. Just rely on pl->sfp_bus.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19 13:07:35 +01:00
Tom Lendacky
e7d445ab26 x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
When runtime support for converting between 4-level and 5-level pagetables
was added to the kernel, the SME code that built pagetables was updated
to use the pagetable functions, e.g. p4d_offset(), etc., in order to
simplify the code. However, the use of the pagetable functions in early
boot code requires the use of the USE_EARLY_PGTABLE_L5 #define in order to
ensure that the proper definition of pgtable_l5_enabled() is used.

Without the #define, pgtable_l5_enabled() is #defined as
cpu_feature_enabled(X86_FEATURE_LA57). In early boot, the CPU features
have not yet been discovered and populated, so pgtable_l5_enabled() will
return false even when 5-level paging is enabled. This causes the SME code
to always build 4-level pagetables to perform the in-place encryption.
If 5-level paging is enabled, switching to the SME pagetables results in
a page-fault that kills the boot.

Adding the #define results in pgtable_l5_enabled() using the
__pgtable_l5_enabled variable set in early boot and the SME code building
pagetables for the proper paging level.

Fixes: aad983913d ("x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> # 4.18.x
Link: https://lkml.kernel.org/r/2cb8329655f5c753905812d951e212022a480475.1634318656.git.thomas.lendacky@amd.com
2021-10-19 14:07:17 +02:00
Jens Axboe
6155631a0c block: align blkdev_dio inlined bio to a cacheline
We get all sorts of unreliable and funky results since the bio is
designed to align on a cacheline, which it does not when inlined like
this.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:55:52 -06:00
Jens Axboe
e028f167ec block: move blk_mq_tag_to_rq() inline
This is in the fast path of driver issue or completion, and it's a single
array index operation. Move it inline to avoid a function call for it.

This does mean making struct blk_mq_tags block layer public, but there's
not really much in there.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:55:41 -06:00
Jens Axboe
df87eb0fce block: get rid of plug list sorting
Even if we have multiple queues in the plug list, chances that they
are very interspersed is minimal. Don't bother spending CPU cycles
sorting the list.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:55:26 -06:00
Jens Axboe
87c037d11b block: return whether or not to unplug through boolean
Instead of returning the same queue request through a request pointer,
use a boolean to accomplish the same.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:55:04 -06:00
Christoph Hellwig
8a7d267b4a block: don't call blk_status_to_errno in blk_update_request
We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:54:57 -06:00
Borislav Petkov
c688bd5dc9 x86/sev: Carve out HV call's return value verification
Carve out the verification of the HV call return value into a separate
helper and make it more readable.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/YVbYWz%2B8J7iMTJjc@zn.tnic
2021-10-19 13:54:47 +02:00
Jens Axboe
db9a02baa2 block: move bdev_read_only() into the header
This is called for every write in the fast path, move it inline next
to get_disk_ro() which is called internally.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:53:22 -06:00
Jens Axboe
5ca7a8b3f6 io_uring: inform block layer of how many requests we are submitting
The block layer can use this knowledge to make smarter decisions on
how to handle the request, if it knows that N more may be coming. Switch
to using blk_start_plug_nr_ios() to pass in that information.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
88459b50b4 io_uring: simplify io_file_supports_nowait()
Make sure that REQ_F_SUPPORT_NOWAIT is always set io_prep_rw(), and so
we can stop caring about setting it down the line simplifying
io_file_supports_nowait().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60c8f1f5e2cb45e00f4897b2cec10c5b3669da91.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
35645ac3c1 io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags
Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e.
REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT
but also simplifies the code.

One thing to consider is when we don't have ->{read,write}_iter and go
through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait
behaviour but not sure whether it supports it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
e74ead135b io_uring: arm poll for non-nowait files
Don't check if we can do nowait before arming apoll, there are several
reasons for that. First, we don't care much about files that don't
support nowait. Second, it may be useful -- we don't want to be taking
away extra workers from io-wq when it can go in some async. Even if it
will go through io-wq eventually, it make difference in the numbers of
workers actually used. And the last one, it's needed to clean nowait in
future commits.

[kernel test robot: fix unused-var]

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Noah Goldstein
b10841c98c fs/io_uring: Prioritise checking faster conditions first in io_write
This commit reorders the conditions in a branch in io_write. The
reorder to check 'ret2 == -EAGAIN' first as checking
'(req->ctx->flags & IORING_SETUP_IOPOLL)' will likely be more
expensive due to 2x memory derefences.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Link: https://lore.kernel.org/r/20211017013229.4124279-1-goldstein.w.n@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
5cb03d6342 io_uring: clean io_prep_rw()
We already store req->file in a variable in io_prep_rw(), just use it
instead of a couple of left references to kicob->ki_filp.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f5889fc7ab670daefd5ccaedd99416d8355f0ad.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
578c0ee234 io_uring: optimise fixed rw rsrc node setting
Move fixed rw io_req_set_rsrc_node() from rw prep into
io_import_fixed(), if we're using fixed buffers it will always be called
during submission as we save the state in advance,

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68c06f66d5aa9661f1e4b88d08c52d23528297ec.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
caa8fe6e86 io_uring: return iovec from __io_import_iovec
We pass iovec** into __io_import_iovec(), which should keep it,
initialise and modify accordingly. It's expensive, return it directly
from __io_import_iovec encoding errors with ERR_PTR if needed.

io_import_iovec keeps the old interface, but it's inline and so
everything is optimised nicely.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6230e9769982f03a8f86fa58df24666088c44d3e.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:56 -06:00
Pavel Begunkov
d1d681b084 io_uring: optimise io_import_iovec fixed path
Delay loading req->rw.{addr,len} in io_import_iovec until it's really
needed, so removing extra loads for the fixed path, which doesn't use
them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cc48dd0c4f1a37c4ce9aab5784281a2d83ad8be.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
9882131cd9 io_uring: kill io_wq_current_is_worker() in iopoll
Don't decide about locking based on io_wq_current_is_worker(), it's not
consistent with all other code and is expensive, use issue_flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7546d5a58efa4360173541c6fe02ee6b8c7b4ea7.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
9983028e76 io_uring: optimise req->ctx reloads
Don't load req->ctx in advance, it takes an extra register and the field
stays valid even after opcode handlers. It also optimises out req->ctx
load in io_iopoll_req_issued() once it's inlined.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e45ff671c44be0eb904f2e448a211734893fa0b.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
607b6fb801 io_uring: rearrange io_read()/write()
Combine force_nonblock branches (which is already optimised by
compiler), flip branches so the most hot/common path is the first, e.g.
as with non on-stack iov setup, and add extra likely/unlikely
attributions for errror paths.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c2536c5896d70994de76e387ea09a0402173a3f.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
5e49c973fc io_uring: clean up io_import_iovec
Make io_import_iovec taking struct io_rw_state instead of an iter
pointer. First it takes care of initialising iovec pointer, which can be
forgotten. Even more, we can not init it if not needed, e.g. in case of
IORING_OP_READ_FIXED or IORING_OP_READ. Also hide saving iter_state
inside of it by splitting out an inline function of it to avoid extra
ifs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b1bbc213a95e5272d4da5867bb977d9acb6f2109.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
51aac424ae io_uring: optimise io_import_iovec nonblock passing
First, change IO_URING_F_NONBLOCK to take sign bit of the int, so
checking for it can be turned into test + sign-based-jump, makes the
binary smaller and may be faster.

Then, instead of passing need_lock boolean into io_import_iovec() just
give it issue_flags, which is already stored somewhere. Saves some space
on stack, a couple of test + cmov operations and other conversions.

note: we still leave
force_nonblock = issue_flags & IO_URING_F_NONBLOCK
variable, but it's optimised out by the compiler into testing
issue_flags directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee96547e692f6c975c229cd82fc721679571a734.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
c88598a92a io_uring: optimise read/write iov state storing
Currently io_read() and io_write() keep separate pointers to an iter and
to struct iov_iter_state, which is not great for register spilling and
requires more on-stack copies. They are both either on-stack or in
req->async_data at the same time, so use struct io_rw_state and keep a
pointer only to it, so having all the state with just one pointer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5c5e7ffd7dc25fc35075c70411ba99df72f237fa.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
538941e268 io_uring: encapsulate rw state
Add a new struct io_rw_state storing all iov related bits: fast iov,
iterator and iterator state. Not much changes here, simply convert
struct io_async_rw to use it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e8245ffcb568b228a009ec1eb79c993c813679f1.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
258f3a7f84 io_uring: optimise rw comletion handlers
Don't override req->result in io_complete_rw_iopoll() when it's already
of the same value, we have an if just above it, so move the assignment
there. Also, add one simle unlikely() in __io_complete_rw_common().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8dfeb4f84026a20172bcf82c05010abe955874ae.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
f80a50a632 io_uring: prioritise read success path over fails
Rearrange io_read return handling so first we expect it completing
successfully and only then checking for errors, which is a colder path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c91c7c2da11815ec8b04b5d872f60dc4cde662c5.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
04f34081c5 io_uring: consistent typing for issue_flags
Some of the functions keep issue_flags as int, change those to unsigned.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/04ad43797783bc9cc7567f287ab545518f8e8cf2.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
ab40940247 io_uring: optimise rsrc referencing
Apparently, percpu_ref_put/get() are expensive enough if done per
request, get them in a batch and cache on the submission side to avoid
getting it over and over again. Also, if we're completing under
uring_lock, return refs back into the cache instead of
perfcpu_ref_put(). Pretty similar to how we do tctx->cached_refs
accounting, but fall back to normal putting when we already changed a
rsrc node by the time of free.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b40d8c5bc77d3c9550df8a319117a374ac85f8f4.1633817310.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00
Pavel Begunkov
a46be971ed io_uring: optimise io_req_set_rsrc_node()
io_req_set_rsrc_node() reloads loads req->ctx, however it's already in
registers in all use cases, so better to pass it as a parameter.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/67a25557b8a51e90bfd578447a6f1671911b05ae.1633817310.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 05:49:55 -06:00