linux-xiaomi-chiron

Author	SHA1	Message	Date
Manivannan Sadhasivam	8135cc5b27	MAINTAINERS: Update the entry for MHI bus Since Hemant is not carrying out any maintainership duties let's make him as a dedicated reviewer. Also add the new mailing lists dedicated for MHI in subspace mailing list server. Reviewed-by: Hemant Kumar <hemantk@codeaurora.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20211019133901.173966-1-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-10-19 15:49:14 +02:00
Heiko Carstens	1a446b2473	s390: update defconfigs Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-19 15:39:54 +02:00
Heiko Carstens	1254cfbc5f	samples: add s390 support for ftrace direct call samples Add s390 support for ftrace direct call samples, which also enables ftrace direct call selftests within ftrace selftests. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-5-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-19 15:39:54 +02:00
Heiko Carstens	c316eb4460	samples: add HAVE_SAMPLE_FTRACE_DIRECT config option Add HAVE_SAMPLE_FTRACE_DIRECT config option which can be selected by architectures which have support for ftrace direct call samples. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-19 15:39:53 +02:00
Heiko Carstens	3d487acf1b	s390: make STACK_FRAME_OVERHEAD available via asm-offsets.h Make STACK_FRAME_OVERHEAD available via asm-offsets.h. This allows to add s390 specific asm code to e.g. ftrace samples, without requiring to add random header files, which might cause all sort of problems on other architectures. asm-offsets.h can be assumed to be non-problematic. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-19 15:39:53 +02:00
Heiko Carstens	2ab3a0a9fa	s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALL support This is the s390 variant of commit `562955fe6a` ("ftrace/x86: Add register_ftrace_direct() for custom trampolines"). Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-19 15:39:53 +02:00
Cai Huoqing	05be946337	net: ethernet: ixp4xx: Make use of dma_pool_zalloc() instead of dma_pool_alloc/memset() Replacing dma_pool_alloc/memset() with dma_pool_zalloc() to simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:24:26 +01:00
Christophe JAILLET	07fab5a469	ieee802154: Remove redundant 'flush_workqueue()' calls 'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant 'flush_workqueue()' calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:23:38 +01:00
Christoph Hellwig	97eeb5fc14	partitions/ibm: use bdev_nr_sectors instead of open coding it Use the proper helper to read the block device size and switch various places to pass the size in terms of sectors which is more practical. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de [axboe: fix comment typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:17:33 -06:00
Christoph Hellwig	f9831b8857	partitions/efi: use bdev_nr_bytes instead of open coding it Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:16:50 -06:00
Christoph Hellwig	946e993730	block/ioctl: use bdev_nr_sectors and bdev_nr_bytes Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:16:50 -06:00
Leon Romanovsky	cb3dc8901b	devlink: Remove extra device_lock assert checks PCI core code in the pci_call_probe() has a path that doesn't hold device_lock. It happens because the ->probe() is called through the workqueue mechanism. 349 static int pci_call_probe(struct pci_driver drv, struct pci_dev dev, 350 const struct pci_device_id *id) 351 { 352 .... 377 if (cpu < nr_cpu_ids) 378 error = work_on_cpu(cpu, local_pci_probe, &ddi); Luckily enough, the core still ensures that only single flow is executed, so it safe to remove the assert checks that anyway were added for annotations purposes. Fixes: `b88f7b1203` ("devlink: Annotate devlink API calls") Reported-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:16:14 +01:00
David S. Miller	04ee2752a5	linux-can-fixes-for-5.15-20211019 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmFub7gTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqf6yCACAiYQuk1sRh7/b6G4Qk2QUTWUxhG/T NWAOVuYMJkvGAMRAxjA7ecwRqA19/WIS2ys00t8UW62mvFGrGfcSBXWj3baPnx7w rfhyd+gk1LqOjWcaQriDmLO2ERY27roNA7FMtcMCoRlbYkrnYcFyKUMdhQXSW3/R /5KIEWBFzmr6ISUBUpkOeDs3irjy3ScIE0fEBt1+Uu3XydFWt7qnrMqBGlwoc9Xs du5VN5nWTZzdB9ju7p0PyrYdwoHsa1L1XbnIJd1hTLCua38028nuL9v7OGy3QvwN 1Tz10PTnTk+2KMCkR4AwuzEBcHBVIcNmTrpTQAKiecB0QbXQlR9es0J5 =9TSz -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.15-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-10-19 this is a pull request of a single patch for net/master. The patch is by me and fixes the error handling in case of a FC timeout in the TX path of the ISOTOP CAN protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:14:36 +01:00
Andrea Righi	480d42dc00	blk-wbt: prevent NULL pointer dereference in wb_timer_fn The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the following NULL pointer dereference: [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 119.987617] #PF: supervisor read access in kernel mode [ 119.987971] #PF: error_code(0x0000) - not-present page [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0 [ 119.988697] Oops: 0000 [#1] SMP NOPTI [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0 [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246 [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780 [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000 [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500 [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00 [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000 [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0 [ 119.995906] Call Trace: [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30 [ 119.996505] blk_stat_timer_fn+0x138/0x140 [ 119.996830] call_timer_fn+0x2b/0x100 [ 119.997136] __run_timers.part.0+0x1d1/0x240 [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20 [ 119.997826] ? ktime_get+0x3e/0xa0 [ 119.998110] ? native_apic_msr_write+0x2c/0x30 [ 119.998456] ? lapic_next_event+0x20/0x30 [ 119.998779] ? clockevents_program_event+0x94/0xf0 [ 119.999150] run_timer_softirq+0x2a/0x50 [ 119.999465] __do_softirq+0xcb/0x26f [ 119.999764] irq_exit_rcu+0x8c/0xb0 [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90 [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20 In this case simply return from the timer callback (no action required) to prevent the NULL pointer dereference. BugLink: https://bugs.launchpad.net/bugs/1947557 Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/ Fixes: `34dbad5d26` ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:13:41 -06:00
Michael Schmitz	86d46fdaa1	block: ataflop: fix breakage introduced at blk-mq refactoring Refactoring of the Atari floppy driver when converting to blk-mq has broken the state machine in not-so-subtle ways: finish_fdc() must be called when operations on the floppy device have completed. This is crucial in order to relase the ST-DMA lock, which protects against concurrent access to the ST-DMA controller by other drivers (some DMA related, most just related to device register access - broken beyond compare, I know). When rewriting the driver's old do_request() function, the fact that finish_fdc() was called only when all queued requests had completed appears to have been overlooked. Instead, the new request function calls finish_fdc() immediately after the last request has been queued. finish_fdc() executes a dummy seek after most requests, and this overwrites the state machine's interrupt hander that was set up to wait for completion of the read/write request just prior. To make matters worse, finish_fdc() is called before device interrupts are re-enabled, making certain that the read/write interupt is missed. Shifting the finish_fdc() call into the read/write request completion handler ensures the driver waits for the request to actually complete. With a queue depth of 2, we won't see long request sequences, so calling finish_fdc() unconditionally just adds a little overhead for the dummy seeks, and keeps the code simple. While we're at it, kill ataflop_commit_rqs() which does nothing but run finish_fdc() unconditionally, again likely wiping out an in-flight request. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: `6ec3938cff` ("ataflop: convert to blk-mq") CC: linux-block@vger.kernel.org CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 06:11:44 -06:00
luo penghao	3c71e0c9ab	ethernet: Remove redundant statement The variable will be assigned again later in the if condition, there is no meaning there. drivers/net/ethernet/broadcom/tg3.c:5750:2 warning: Value stored to 'current_link_up' is never read. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:10:44 +01:00
Zheyu Ma	c69b2f4687	cavium: Fix return values of the probe function During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:09:57 +01:00
Zheyu Ma	e211210098	mISDN: Fix return values of the probe function During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:09:28 +01:00
Robert Hancock	92817dad7d	net: phylink: Support disabling autonegotiation for PCS The auto-negotiation state in the PCS as set by phylink_mii_c22_pcs_config was previously always enabled when the driver is configured for in-band autonegotiation, even if autonegotiation was disabled on the interface with ethtool. Update the code to set the BMCR_ANENABLE bit based on the interface's autonegotiation enabled state. Update phylink_mii_c22_pcs_get_state to not check autonegotiation-related fields when autonegotiation is disabled. Update phylink_mac_pcs_get_state to initialize the state based on the interface's configured speed, duplex and pause parameters rather than to unknown when autonegotiation is disabled, before calling the driver's pcs_get_state functions, as they are not likely to provide meaningful data for these fields when autonegotiation is disabled. In this case the driver is really just filling in the link state field. Note that in cases where there is a downstream PHY connected, such as with SGMII and a copper PHY, the configuration set by ethtool is handled by phy_ethtool_ksettings_set and not propagated to the PCS. This is correct since SGMII or 1000Base-X autonegotiation with the PCS should normally still be used even if the copper side has disabled it. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:07:36 +01:00
Sebastian Andrzej Siewior	e22db7bd55	net: sched: Allow statistics reads from softirq. Eric reported that the rate estimator reads statics from the softirq which in turn triggers a warning introduced in the statistics rework. The warning is too cautious. The updates happen in the softirq context so reads from softirq are fine since the writes can not be preempted. The updates/writes happen during qdisc_run() which ensures one writer and the softirq context. The remaining bad context for reading statistics remains in hard-IRQ because it may preempt a writer. Fixes: `29cbcd8582` ("net: sched: Remove Qdisc::running sequence counter") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:07:35 +01:00
Russell King (Oracle)	dc90604b58	net: phylink: rejig SFP interface selection in ksettings_set() Commit `ea269a6f72` ("net: phylink: Update SFP selected interface on advertising changes") added a better solution to selecting the interface mode for SFPs using the advertisement mask. This method will work for mvneta and mvpp2 when selecting between 2500base-X and 1000base-X without needing to use the basex helper, or indicate that we support both 1000base-X and 2500base-X when in either of these two interface modes. Hence, we need to eliminate the validation prior to selecting the interface, otherwise when we clean up mvneta's validation function, we will end up locking to 2500base-X as we validate with an interface mode of PHY_INERFACE_MODE_2500BASEX. The supported mask will already have been reduced down to the union of support for the SFP and MAC already, so we can be confident that the advertisement mask is already appropriately restricted. We only need to select the appropriate interface, and then revalidate with the new interface mode. We get rid of the check for pl->sfp_port too, this is meaningless here as it doesn't get cleared when a module is removed, so it doesn't indicate if a module is present. Just rely on pl->sfp_bus. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-19 13:07:35 +01:00
Tom Lendacky	e7d445ab26	x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c When runtime support for converting between 4-level and 5-level pagetables was added to the kernel, the SME code that built pagetables was updated to use the pagetable functions, e.g. p4d_offset(), etc., in order to simplify the code. However, the use of the pagetable functions in early boot code requires the use of the USE_EARLY_PGTABLE_L5 #define in order to ensure that the proper definition of pgtable_l5_enabled() is used. Without the #define, pgtable_l5_enabled() is #defined as cpu_feature_enabled(X86_FEATURE_LA57). In early boot, the CPU features have not yet been discovered and populated, so pgtable_l5_enabled() will return false even when 5-level paging is enabled. This causes the SME code to always build 4-level pagetables to perform the in-place encryption. If 5-level paging is enabled, switching to the SME pagetables results in a page-fault that kills the boot. Adding the #define results in pgtable_l5_enabled() using the __pgtable_l5_enabled variable set in early boot and the SME code building pagetables for the proper paging level. Fixes: `aad983913d` ("x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> # 4.18.x Link: https://lkml.kernel.org/r/2cb8329655f5c753905812d951e212022a480475.1634318656.git.thomas.lendacky@amd.com	2021-10-19 14:07:17 +02:00
Jens Axboe	6155631a0c	block: align blkdev_dio inlined bio to a cacheline We get all sorts of unreliable and funky results since the bio is designed to align on a cacheline, which it does not when inlined like this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:52 -06:00
Jens Axboe	e028f167ec	block: move blk_mq_tag_to_rq() inline This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:41 -06:00
Jens Axboe	df87eb0fce	block: get rid of plug list sorting Even if we have multiple queues in the plug list, chances that they are very interspersed is minimal. Don't bother spending CPU cycles sorting the list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:26 -06:00
Jens Axboe	87c037d11b	block: return whether or not to unplug through boolean Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:55:04 -06:00
Christoph Hellwig	8a7d267b4a	block: don't call blk_status_to_errno in blk_update_request We only need to call it to resolve the blk_status_t -> errno mapping for tracing, so move the conversion into the tracepoints that are not called at all when tracing isn't enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:54:57 -06:00
Borislav Petkov	c688bd5dc9	x86/sev: Carve out HV call's return value verification Carve out the verification of the HV call return value into a separate helper and make it more readable. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/YVbYWz%2B8J7iMTJjc@zn.tnic	2021-10-19 13:54:47 +02:00
Jens Axboe	db9a02baa2	block: move bdev_read_only() into the header This is called for every write in the fast path, move it inline next to get_disk_ro() which is called internally. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:53:22 -06:00
Jens Axboe	5ca7a8b3f6	io_uring: inform block layer of how many requests we are submitting The block layer can use this knowledge to make smarter decisions on how to handle the request, if it knows that N more may be coming. Switch to using blk_start_plug_nr_ios() to pass in that information. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	88459b50b4	io_uring: simplify io_file_supports_nowait() Make sure that REQ_F_SUPPORT_NOWAIT is always set io_prep_rw(), and so we can stop caring about setting it down the line simplifying io_file_supports_nowait(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60c8f1f5e2cb45e00f4897b2cec10c5b3669da91.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	35645ac3c1	io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e. REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT but also simplifies the code. One thing to consider is when we don't have ->{read,write}_iter and go through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait behaviour but not sure whether it supports it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	e74ead135b	io_uring: arm poll for non-nowait files Don't check if we can do nowait before arming apoll, there are several reasons for that. First, we don't care much about files that don't support nowait. Second, it may be useful -- we don't want to be taking away extra workers from io-wq when it can go in some async. Even if it will go through io-wq eventually, it make difference in the numbers of workers actually used. And the last one, it's needed to clean nowait in future commits. [kernel test robot: fix unused-var] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Noah Goldstein	b10841c98c	fs/io_uring: Prioritise checking faster conditions first in io_write This commit reorders the conditions in a branch in io_write. The reorder to check 'ret2 == -EAGAIN' first as checking '(req->ctx->flags & IORING_SETUP_IOPOLL)' will likely be more expensive due to 2x memory derefences. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Link: https://lore.kernel.org/r/20211017013229.4124279-1-goldstein.w.n@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	5cb03d6342	io_uring: clean io_prep_rw() We already store req->file in a variable in io_prep_rw(), just use it instead of a couple of left references to kicob->ki_filp. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f5889fc7ab670daefd5ccaedd99416d8355f0ad.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	578c0ee234	io_uring: optimise fixed rw rsrc node setting Move fixed rw io_req_set_rsrc_node() from rw prep into io_import_fixed(), if we're using fixed buffers it will always be called during submission as we save the state in advance, Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/68c06f66d5aa9661f1e4b88d08c52d23528297ec.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	caa8fe6e86	io_uring: return iovec from __io_import_iovec We pass iovec** into __io_import_iovec(), which should keep it, initialise and modify accordingly. It's expensive, return it directly from __io_import_iovec encoding errors with ERR_PTR if needed. io_import_iovec keeps the old interface, but it's inline and so everything is optimised nicely. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6230e9769982f03a8f86fa58df24666088c44d3e.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:56 -06:00
Pavel Begunkov	d1d681b084	io_uring: optimise io_import_iovec fixed path Delay loading req->rw.{addr,len} in io_import_iovec until it's really needed, so removing extra loads for the fixed path, which doesn't use them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cc48dd0c4f1a37c4ce9aab5784281a2d83ad8be.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	9882131cd9	io_uring: kill io_wq_current_is_worker() in iopoll Don't decide about locking based on io_wq_current_is_worker(), it's not consistent with all other code and is expensive, use issue_flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7546d5a58efa4360173541c6fe02ee6b8c7b4ea7.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	9983028e76	io_uring: optimise req->ctx reloads Don't load req->ctx in advance, it takes an extra register and the field stays valid even after opcode handlers. It also optimises out req->ctx load in io_iopoll_req_issued() once it's inlined. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1e45ff671c44be0eb904f2e448a211734893fa0b.1634314022.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	607b6fb801	io_uring: rearrange io_read()/write() Combine force_nonblock branches (which is already optimised by compiler), flip branches so the most hot/common path is the first, e.g. as with non on-stack iov setup, and add extra likely/unlikely attributions for errror paths. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2c2536c5896d70994de76e387ea09a0402173a3f.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	5e49c973fc	io_uring: clean up io_import_iovec Make io_import_iovec taking struct io_rw_state instead of an iter pointer. First it takes care of initialising iovec pointer, which can be forgotten. Even more, we can not init it if not needed, e.g. in case of IORING_OP_READ_FIXED or IORING_OP_READ. Also hide saving iter_state inside of it by splitting out an inline function of it to avoid extra ifs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b1bbc213a95e5272d4da5867bb977d9acb6f2109.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	51aac424ae	io_uring: optimise io_import_iovec nonblock passing First, change IO_URING_F_NONBLOCK to take sign bit of the int, so checking for it can be turned into test + sign-based-jump, makes the binary smaller and may be faster. Then, instead of passing need_lock boolean into io_import_iovec() just give it issue_flags, which is already stored somewhere. Saves some space on stack, a couple of test + cmov operations and other conversions. note: we still leave force_nonblock = issue_flags & IO_URING_F_NONBLOCK variable, but it's optimised out by the compiler into testing issue_flags directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ee96547e692f6c975c229cd82fc721679571a734.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	c88598a92a	io_uring: optimise read/write iov state storing Currently io_read() and io_write() keep separate pointers to an iter and to struct iov_iter_state, which is not great for register spilling and requires more on-stack copies. They are both either on-stack or in req->async_data at the same time, so use struct io_rw_state and keep a pointer only to it, so having all the state with just one pointer. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5c5e7ffd7dc25fc35075c70411ba99df72f237fa.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	538941e268	io_uring: encapsulate rw state Add a new struct io_rw_state storing all iov related bits: fast iov, iterator and iterator state. Not much changes here, simply convert struct io_async_rw to use it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e8245ffcb568b228a009ec1eb79c993c813679f1.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	258f3a7f84	io_uring: optimise rw comletion handlers Don't override req->result in io_complete_rw_iopoll() when it's already of the same value, we have an if just above it, so move the assignment there. Also, add one simle unlikely() in __io_complete_rw_common(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8dfeb4f84026a20172bcf82c05010abe955874ae.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	f80a50a632	io_uring: prioritise read success path over fails Rearrange io_read return handling so first we expect it completing successfully and only then checking for errors, which is a colder path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c91c7c2da11815ec8b04b5d872f60dc4cde662c5.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	04f34081c5	io_uring: consistent typing for issue_flags Some of the functions keep issue_flags as int, change those to unsigned. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/04ad43797783bc9cc7567f287ab545518f8e8cf2.1634144845.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	ab40940247	io_uring: optimise rsrc referencing Apparently, percpu_ref_put/get() are expensive enough if done per request, get them in a batch and cache on the submission side to avoid getting it over and over again. Also, if we're completing under uring_lock, return refs back into the cache instead of perfcpu_ref_put(). Pretty similar to how we do tctx->cached_refs accounting, but fall back to normal putting when we already changed a rsrc node by the time of free. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b40d8c5bc77d3c9550df8a319117a374ac85f8f4.1633817310.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00
Pavel Begunkov	a46be971ed	io_uring: optimise io_req_set_rsrc_node() io_req_set_rsrc_node() reloads loads req->ctx, however it's already in registers in all use cases, so better to pass it as a parameter. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/67a25557b8a51e90bfd578447a6f1671911b05ae.1633817310.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 05:49:55 -06:00

... 143 144 145 146 147 ...

1059881 commits