linux-xiaomi-chiron

Author	SHA1	Message	Date
Jakub Kicinski	44ea62813f	spi: don't include ptp_clock_kernel.h in spi.h Commit `b42faeee71` ("spi: Add a PTP system timestamp to the transfer structure") added an include of ptp_clock_kernel.h to spi.h for struct ptp_system_timestamp but a forward declaration is enough. Let's use that to limit the number of objects we have to rebuild every time we touch networking headers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210904013140.2377609-1-kuba@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 17:14:30 +00:00
Fabio Estevam	530792efa6	regmap: Call regmap_debugfs_exit() prior to _init() Since commit `cffa4b2122` ("regmap: debugfs: Fix a memory leak when calling regmap_attach_dev"), the following debugfs error is seen on i.MX boards: debugfs: Directory 'dummy-iomuxc-gpr@20e0000' with parent 'regmap' already present! In the attempt to fix the memory leak, the above commit added a NULL check for map->debugfs_name. For the first debufs entry, map->debugfs_name is NULL and then the new name is allocated via kasprintf(). For the second debugfs entry, map->debugfs_name() is no longer NULL, so it will keep using the old entry name and the duplicate name error is seen. Quoting Mark Brown: "That means that if the device gets freed we'll end up with the old debugfs file hanging around pointing at nothing. ... To be more explicit this means we need a call to regmap_debugfs_exit() which will clean up all the existing debugfs stuff before we loose references to it." Call regmap_debugfs_exit() prior to regmap_debugfs_init() to fix the problem. Tested on i.MX6Q and i.MX6SX boards. Fixes: `cffa4b2122` ("regmap: debugfs: Fix a memory leak when calling regmap_attach_dev") Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fabio Estevam <festevam@denx.de> Link: https://lore.kernel.org/r/20220107163307.335404-1-festevam@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 17:14:29 +00:00
Jason Wang	5322c68e58	iavf: remove an unneeded variable The variable `ret_code' used for returning is never changed in function `iavf_shutdown_adminq'. So that it can be removed and just return its initial value 0 at the end of `iavf_shutdown_adminq' function. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:04:21 -08:00
Yang Li	a127adf2fc	i40e: remove variables set but not used The code that uses variables pe_cntx_size and pe_filt_size has been removed, so they should be removed as well. Eliminate the following clang warnings: drivers/net/ethernet/intel/i40e/i40e_common.c:4139:20: warning: variable 'pe_filt_size' set but not used. drivers/net/ethernet/intel/i40e/i40e_common.c:4139:6: warning: variable 'pe_cntx_size' set but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:04:21 -08:00
Mateusz Palczewski	17b33d4319	i40e: Remove non-inclusive language Remove non-inclusive language from the driver. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:04:21 -08:00
Mateusz Palczewski	9c83ca8a63	i40e: Update FW API version Update FW API versions to the newest supported NVM images. Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:04:21 -08:00
Jedrzej Jagielski	ef39584ddb	i40e: Minimize amount of busy-waiting during AQ send The i40e_asq_send_command will now use a non blocking usleep_range if possible (non-atomic context), instead of busy-waiting udelay. The usleep_range function uses hrtimers to provide better performance and removes the negative impact of busy-waiting in time-critical environments. 1. Rename i40e_asq_send_command to i40e_asq_send_command_atomic and add 5th parameter to inform if called from an atomic context. Call inside usleep_range (if non-atomic) or udelay (if atomic). 2. Change i40e_asq_send_command to invoke i40e_asq_send_command_atomic(..., false). 3. Change two functions: - i40e_aq_set_vsi_uc_promisc_on_vlan - i40e_aq_set_vsi_mc_promisc_on_vlan to explicitly use i40e_asq_send_command_atomic(..., true) instead of i40e_asq_send_command, as they use spinlocks and do some work in an atomic context. All other calls to i40e_asq_send_command remain unchanged. Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:04:21 -08:00
Nikunj A Dadhania	fffb532378	KVM: x86: Check for rmaps allocation With TDP MMU being the default now, access to mmu_rmaps_stat debugfs file causes following oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 3185 Comm: cat Not tainted 5.16.0-rc4+ #204 RIP: 0010:pte_list_count+0x6/0x40 Call Trace: <TASK> ? kvm_mmu_rmaps_stat_show+0x15e/0x320 seq_read_iter+0x126/0x4b0 ? aa_file_perm+0x124/0x490 seq_read+0xf5/0x140 full_proxy_read+0x5c/0x80 vfs_read+0x9f/0x1a0 ksys_read+0x67/0xe0 __x64_sys_read+0x19/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fca6fc13912 Return early when rmaps are not present. Reported-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220105040337.4234-1-nikunj@amd.com> Cc: stable@vger.kernel.org Fixes: `3bcd0662d6` ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 12:04:01 -05:00
Karen Sornek	cfb1d572c9	i40e: Add ensurance of MacVlan resources for every trusted VF Trusted VF can use up every resource available, leaving nothing to other trusted VFs. Introduce define, which calculates MacVlan resources available based on maximum available MacVlan resources, bare minimum for each VF and number of currently allocated VFs. Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-01-07 09:03:44 -08:00
Wanpeng Li	597cb7968c	KVM: SEV: Mark nested locking of kvm->lock Both source and dest vms' kvm->locks are held in sev_lock_two_vms. Mark one with a different subtype to avoid false positives from lockdep. Fixes: `c9d61dcb0b` (KVM: SEV: accept signals in sev_lock_two_vms) Reported-by: Yiru Xu <xyru1999@gmail.com> Tested-by: Jinrong Liang <cloudliang@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1641364863-26331-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-01-07 12:01:55 -05:00
Dave Hansen	2056e2989b	x86/sgx: Fix NULL pointer dereference on non-SGX systems == Problem == Nathan Chancellor reported an oops when aceessing the 'sgx_total_bytes' sysfs file: https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/ The sysfs output code accesses the sgx_numa_nodes[] array unconditionally. However, this array is allocated during SGX initialization, which only occurs on systems where SGX is supported. If the sysfs file is accessed on systems without SGX support, sgx_numa_nodes[] is NULL and an oops occurs. == Solution == To fix this, hide the entire nodeX/x86/ attribute group on systems without SGX support using the ->is_visible attribute group callback. Unfortunately, SGX is initialized via a device_initcall() which occurs _after_ the ->is_visible() callback. Instead of moving SGX initialization earlier, call sysfs_update_group() during SGX initialization to update the group visiblility. This update requires moving the SGX sysfs code earlier in sgx/main.c. There are no code changes other than the addition of arch_update_sysfs_visibility() and a minor whitespace fixup to arch_node_attr_is_visible() which checkpatch caught. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-sgx@vger.kernel.org Cc: x86@kernel.org Fixes: `50468e4313` ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com	2022-01-07 08:47:23 -08:00
Kevin Bracey	c25af830ab	sch_cake: revise Diffserv docs Documentation incorrectly stated that CS1 is equivalent to LE for diffserv8. But when LE was added to the table, CS1 was pushed into tin 1, leaving only LE in tin 0. Also "TOS1" no longer exists, as that is the same codepoint as LE. Make other tweaks properly distinguishing codepoints from classes and putting current Diffserve codepoints ahead of legacy ones. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220106215637.3132391-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-07 08:41:29 -08:00
Mauro Carvalho Chehab	87d6576ddf	scripts: sphinx-pre-install: Fix ctex support on Debian The name of the package with ctexhook.sty is different on Debian/Ubuntu. Reported-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Tested-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/r/63882425609a2820fac78f5e94620abeb7ed5f6f.1641429634.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-01-07 09:33:13 -07:00
Jonathan Corbet	db67eb748e	docs: discourage use of list tables Our documentation encourages the use of list-table formats, but that advice runs counter to the objective of keeping the plain-text documentation as useful and readable as possible. Turn that advice around the other way so that people don't keep adding these tables. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-01-07 09:33:13 -07:00
Thorsten Leemhuis	bf33a9d42d	docs: 5.Posting.rst: describe Fixes: and Link: tags Explain Fixes: and Link: tags in Documentation/process/5.Posting.rst, which are missing in this file for unknown reasons and only described in Documentation/process/submitting-patches.rst. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> CC: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://lore.kernel.org/r/c4a5f5e25fa84b26fd383bba6eafde4ab57c9de7.1641314856.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-01-07 09:33:13 -07:00
Christian Löhle	689d8014d9	Documentation: kgdb: Replace deprecated remotebaud Using set remotebaud to set the baud rate was deprecated in gdb-7.7 and completely removed from the command parser in gdb-7.8 (released in 2014). Adopt set serial baud instead. Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/4050689967ed46baaa3bfadda53a0e73@hyperstone.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-01-07 09:33:13 -07:00
James Clark	7cc4c09269	docs: automarkup.py: Fix invalid HTML link output and broken URI fragments Since commit `d18b01789a` ("docs: Add automatic cross-reference for documentation pages"), references that were already explicitly defined with "ref:" and referred to other pages with a path have been doubled. This is reported as the following error by Firefox: Start tag "a" seen but an element of the same type was already open. End tag "a" violates nesting rules. As well as the invalid HTML, this also obscures the URI fragment links to subsections because the second link overrides the first. For example on the page admin-guide/hw-vuln/mds.html the last link should be to the "Default Mitigations" subsection using a # URI fragment: admin-guide/hw-vuln/l1tf.html#default-mitigations But it is obsured by a second link to the whole page: admin-guide/hw-vuln/l1tf.html The full HTML with the double <a> tags looks like this: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref"> <a class="reference internal" href="l1tf.html"> <span class="doc">L1TF - L1 Terminal Fault</span> </a> </span> </a> After this commit, there is only a single link: <a class="reference internal" href="l1tf.html#default-mitigations"> <span class="std std-ref">Documentation/admin-guide/hw-vuln//l1tf.rst</span> </a> Now that the second link is removed, the browser correctly jumps to the default-mitigations subsection when clicking the link. The fix is to check that nodes in the document to be modified are not already references. A reference is counted as any text that is a descendant of a reference type node. Only plain text should be converted to new references, otherwise the doubling occurs. Testing ======= * Test that the build stdout is the same (ignoring ordering), and that no new warnings are printed. * Diff all .html files and check that the only modifications occur to the bad double links. * The auto linking of bare references to pages without "ref:" is still working. Fixes: `d18b01789a` ("docs: Add automatic cross-reference for documentation pages") Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20220105143640.330602-2-james.clark@arm.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2022-01-07 09:32:58 -07:00
Dan Carpenter	dc35616e6c	netrom: fix api breakage in nr_setsockopt() This needs to copy an unsigned int from user space instead of a long to avoid breaking user space with an API change. I have updated all the integer overflow checks from ULONG to UINT as well. This is a slight API change but I do not expect it to affect anything in real life. Fixes: `3087a6f36e` ("netrom: fix copying in user data in nr_setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-07 14:11:05 +00:00
Dan Carpenter	9371937092	ax25: uninitialized variable in ax25_setsockopt() The "opt" variable is unsigned long but we only copy 4 bytes from the user so the lower 4 bytes are uninitialized. I have changed the integer overflow checks from ULONG to UINT as well. This is a slight API change but I don't expect it to break anything. Fixes: `a7b75c5a8c` ("net: pass a sockptr_t into ->setsockopt") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-07 14:10:26 +00:00
David S. Miller	b69c5b5886	Merge branch 'octeontx2-ptp-bugs' Subbaraya Sundeep says: ==================== octeontx2: Fix PTP bugs This patchset addresses two problems found when using ptp. Patch 1 - Increases the refcount of ptp device before use which was missing and it lead to refcount increment after use bug when module is loaded and unloaded couple of times. Patch 2 - PTP resources allocated by VF are not being freed during VF teardown. This patch fixes that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-07 14:04:19 +00:00
Rakesh Babu Saladi	eabd0f88b0	octeontx2-nicvf: Free VF PTP resources. When a VF is removed respective PTP resources are not being freed currently. This patch fixes it. Fixes: `43510ef4dd` ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF") Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-07 14:04:19 +00:00
Subbaraya Sundeep	93440f4888	octeontx2-af: Increment ptp refcount before use Before using the ptp pci device by AF driver increment the reference count of it. Fixes: `a8b90c9d26` ("octeontx2-af: Add PTP device id for CN10K and 95O silcons") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-07 14:04:19 +00:00
Miaoqian Lin	69c1b87516	spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe If the probe fails, we should use pm_runtime_disable() to balance pm_runtime_enable(). Add missing pm_runtime_disable() for meson_spifc_probe. Fixes: `c3e4bc5434` ("spi: meson: Add support for Amlogic Meson SPIFC") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220107075424.7774-1-linmq006@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 13:36:38 +00:00
Qinghua Jin	c8c9cb6d9f	spi: atmel: Fix typo Change 'actualy' to 'actually' Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com> Link: https://lore.kernel.org/r/20220107024631.396862-1-qhjin.dev@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 13:36:37 +00:00
Watson Chow	bfff546aae	regulator: Add MAX20086-MAX20089 driver The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add a driver that supports controlling the outputs individually. Additional features, such as overcurrent detection, may be added later if needed. Signed-off-by: Watson Chow <watson.chow@avnet.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-3-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 13:36:35 +00:00
Laurent Pinchart	764aaa4e03	dt-bindings: regulators: Add bindings for Maxim MAX20086-MAX20089 The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add corresponding DT bindings. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://lore.kernel.org/r/20220106224350.16957-2-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Mark Brown <broonie@kernel.org>	2022-01-07 13:36:34 +00:00
Qu Wenruo	36c86a9e1b	btrfs: output more debug messages for uncommitted transaction Print extra information about how many dirty bytes an uncommitted has at the end of mount. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Filipe Manana	c2f822635d	btrfs: respect the max size in the header when activating swap file If we extended the size of a swapfile after its header was created (by the mkswap utility) and then try to activate it, we will map the entire file when activating the swap file, instead of limiting to the max size defined in the swap file's header. Currently test case generic/643 from fstests fails because we do not respect that size limit defined in the swap file's header. So fix this by not mapping file ranges beyond the max size defined in the swap header. This is the same type of bug that iomap used to have, and was fixed in commit `36ca7943ac` ("mm/swap: consider max pages in iomap_swapfile_add_extent"). Fixes: `ed46ff3d42` ("Btrfs: support swap files") CC: stable@vger.kernel.org # 5.4+ Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Yang Li	be8d1a2ab9	btrfs: fix argument list that the kdoc format and script verified The warnings were found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/btrfs/extent_io.c:3210: warning: Function parameter or member 'bio_ctrl' not described in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'bio' description in 'btrfs_bio_add_page' fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'prev_bio_flags' description in 'btrfs_bio_add_page' fs/btrfs/space-info.c:1602: warning: Excess function parameter 'root' description in 'btrfs_reserve_metadata_bytes' fs/btrfs/space-info.c:1602: warning: Function parameter or member 'fs_info' not described in 'btrfs_reserve_metadata_bytes' Note: this is fixing only the warnings regarding parameter list, the first line is not strictly conforming to the kdoc format as the btrfs codebase does not stick to that and keeps the first line more free form (because it's only for internal use). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note ] Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Su Yue	4a9e803e5b	btrfs: remove unnecessary parameter type from compression_decompress_bio btrfs_decompress_bio, the only caller of compression_decompress_bio gets type from @cb and passes it to compression_decompress_bio. However, compression_decompress_bio can get compression type directly from @cb. So remove the parameter and access it through @cb. No functional change. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Qu Wenruo	856e47946c	btrfs: selftests: dump extent io tree if extent-io-tree test failed When code modifying extent-io-tree get modified and got that selftest failed, it can take some time to pin down the cause. To make it easier to expose the problem, dump the extent io tree if the selftest failed. This can save developers debug time, especially since the selftest we can not use the trace events, thus have to manually add debug trace points. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Qu Wenruo	2ae8ae3d3d	btrfs: scrub: cleanup the argument list of scrub_stripe() The argument list of btrfs_stripe() has similar problems of scrub_chunk(): - Duplicated and ambiguous @base argument Can be fetched from btrfs_block_group::bg. - Ambiguous argument @length It's again device extent length - Ambiguous argument @num The instinctive guess would be mirror number, but in fact it's stripe index. Fix it by: - Remove @base parameter - Rename @length to @dev_extent_len - Rename @num to @stripe_index Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:27 +01:00
Qu Wenruo	d04fbe19ae	btrfs: scrub: cleanup the argument list of scrub_chunk() The argument list of scrub_chunk() has the following problems: - Duplicated @chunk_offset It is the same as btrfs_block_group::start. - Confusing @length The most instinctive guess is chunk length, and one may want to delete it, but the truth is, it's the device extent length. Fix this by: - Remove @chunk_offset Use btrfs_block_group::start instead. - Rename @length to @dev_extent_len Also rename the caller to remove the ambiguous naming. - Rename @cache to @bg The "_cache" suffix for btrfs_block_group has been removed for a while. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Qu Wenruo	f26c923860	btrfs: remove reada infrastructure Currently there is only one user for btrfs metadata readahead, and that's scrub. But even for the single user, it's not providing the correct functionality it needs, as scrub needs reada for commit root, which current readahead can't provide. (Although it's pretty easy to add such feature). Despite this, there are some extra problems related to metadata readahead: - Duplicated feature with btrfs_path::reada - Partly duplicated feature of btrfs_fs_info::buffer_radix Btrfs already caches its metadata in buffer_radix, while readahead tries to read the tree block no matter if it's already cached. - Poor layer separation Metadata readahead works kinda at device level. This is definitely not the correct layer it should be, since metadata is at btrfs logical address space, it should not bother device at all. This brings extra chance for bugs to sneak in, while brings unnecessary complexity. - Dead code In the very beginning of scrub.c we have #undef DEBUG, rendering all the debug related code useless and unable to test. Thus here I purpose to remove the metadata readahead mechanism completely. [BENCHMARK] There is a full benchmark for the scrub performance difference using the old btrfs_reada_add() and btrfs_path::reada. For the worst case (no dirty metadata, slow HDD), there could be a 5% performance drop for scrub. For other cases (even SATA SSD), there is no distinguishable performance difference. The number is reported scrub speed, in MiB/s. The resolution is limited by the reported duration, which only has a resolution of 1 second. Old New Diff SSD 455.3 466.332 +2.42% HDD 103.927 98.012 -5.69% Comprehensive test methodology is in the cover letter of the patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Qu Wenruo	dcf62b204c	btrfs: scrub: use btrfs_path::reada for extent tree readahead For scrub, we trigger two readaheads for two trees, extent tree to get where to scrub, and csum tree to get the data checksum. For csum tree we already trigger readahead in btrfs_lookup_csums_range(), by setting path->reada. But for extent tree we don't have any path based readahead. Add the readahead for extent tree as well, so we can later remove the btrfs_reada_add() based readahead. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Qu Wenruo	2522dbe86b	btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity() In function scrub_stripe() we allocated two btrfs_path's, one @path for extent tree search and another @ppath for full stripe extent tree search for RAID56. This is totally umncessary, as the @ppath usage is completely inside scrub_raid56_parity(), thus we can move the path allocation into scrub_raid56_parity() completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Nikolay Borisov	c122799643	btrfs: refactor unlock_up The purpose of this function is to unlock all nodes in a btrfs path which are above 'lowest_unlock' and whose slot used is different than 0. As such it used slightly awkward structure of 'if' as well as somewhat cryptic "no_skip" control variable which denotes whether we should check the current level of skipability or no. This patch does the following (cosmetic) refactorings: * Renames 'no_skip' to 'check_skip' and makes it a boolean. This variable controls whether we are below the lowest_unlock/skip_level levels. * Consolidates the 2 conditions which warrant checking whether the current level should be skipped under 1 common if (check_skip) branch, this increase indentation level but is not critical. * Consolidates the 'skip_level < i && i >= lowest_unlock' and 'i >= lowest_unlock && i > skip_level' condition into a common branch since those are identical. * Eliminates the local extent_buffer variable as in this case it doesn't bring anything to function readability. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Filipe Manana	1b58ae0e4d	btrfs: skip transaction commit after failure to create subvolume At ioctl.c:create_subvol(), when we fail to create a subvolume we always commit the transaction. In most cases this is a no-op, since all the error paths, except for one, abort the transaction - the only exception is when we fail to insert the new root item into the root tree, in that case we don't abort the transaction because we didn't do anything that is irreversible - however we end up committing the transaction which although is not a functional problem, it adds unnecessary rotation of the backup roots in the superblock and unnecessary work. So change that to commit a transaction only when no error happened, otherwise just call btrfs_end_transaction() to release our reference on the transaction. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Naohiro Aota	82187d2ecd	btrfs: zoned: fix chunk allocation condition for zoned allocator The ZNS specification defines a limit on the number of "active" zones. That limit impose us to limit the number of block groups which can be used for an allocation at the same time. Not to exceed the limit, we reuse the existing active block groups as much as possible when we can't activate any other zones without sacrificing an already activated block group in commit `a85f05e59b` ("btrfs: zoned: avoid chunk allocation if active block group has enough space"). However, the check is wrong in two ways. First, it checks the condition for every raid index (ffe_ctl->index). Even if it reaches the condition and "ffe_ctl->max_extent_size >= ffe_ctl->min_alloc_size" is met, there can be other block groups having enough space to hold ffe_ctl->num_bytes. (Actually, this won't happen in the current zoned code as it only supports SINGLE profile. But, it can happen once it enables other RAID types.) Second, it checks the active zone availability depending on the raid index. The raid index is just an index for space_info->block_groups, so it has nothing to do with chunk allocation. These mistakes are causing a faulty allocation in a certain situation. Consider we are running zoned btrfs on a device whose max_active_zone == 0 (no limit). And, suppose no block group have a room to fit ffe_ctl->num_bytes but some room to meet ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >= min_alloc_size). In this situation, the following occur: - With SINGLE raid_index, it reaches the chunk allocation checking code - The check returns true because we can activate a new zone (no limit) - But, before allocating the chunk, it iterates to the next raid index (RAID5) - Since there are no RAID5 block groups on zoned mode, it again reaches the check code - The check returns false because of btrfs_can_activate_zone()'s "if (raid_index != BTRFS_RAID_SINGLE)" part - That results in returning -ENOSPC without allocating a new chunk As a result, we end up hitting -ENOSPC too early. Move the check to the right place in the can_allocate_chunk() hook, and do the active zone check depending on the allocation flag, not on the raid index. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Naohiro Aota	50475cd577	btrfs: add extent allocator hook to decide to allocate chunk or not Introduce a new hook for an extent allocator policy. With the new hook, a policy can decide to allocate a new block group or not. If not, it will return -ENOSPC, so btrfs_reserve_extent() will cut the allocation size in half and retry the allocation if min_alloc_size is large enough. The hook has a place holder and will be replaced with the real implementation in the next patch. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Naohiro Aota	1ada69f61c	btrfs: zoned: unset dedicated block group on allocation failure Allocating an extent from a block group can fail for various reasons. When an allocation from a dedicated block group (for tree-log or relocation data) fails, we need to unregister it as a dedicated one so that we can allocate a new block group for the dedicated one. However, we are returning early when the block group in case it is read-only, fully used, or not be able to activate the zone. As a result, we keep the non-usable block group as a dedicated one, leading to further allocation failure. With many block groups, the allocator will iterate hopeless loop to find a free extent, results in a hung task. Fix the issue by delaying the return and doing the proper cleanups. CC: stable@vger.kernel.org # 5.16 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Johannes Thumshirn	7367271000	btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned REQ_OP_ZONE_APPEND can only work on zoned devices, so it is redundant to check if the filesystem is zoned when REQ_OP_ZONE_APPEND is set as the bio's bio_op. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Johannes Thumshirn	554aed7da2	btrfs: zoned: sink zone check into btrfs_repair_one_zone Sink zone check into btrfs_repair_one_zone() so we don't need to do it in all callers. Also as btrfs_repair_one_zone() doesn't return a sensible error, make it a boolean function and return false in case it got called on a non-zoned filesystem and true on a zoned filesystem. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:26 +01:00
Johannes Thumshirn	8fdf54fe69	btrfs: zoned: simplify btrfs_check_meta_write_pointer btrfs_check_meta_write_pointer() will always be called with a NULL 'cache_ret' argument. As there's no need to check if we have a valid block_group passed in remove these checks. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Johannes Thumshirn	869f4cdc73	btrfs: zoned: encapsulate inode locking for zoned relocation Encapsulate the inode lock needed for serializing the data relocation writes on a zoned filesystem into a helper. This streamlines the code reading flow and hides special casing for zoned filesystems. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Anand Jain	a26d60dedf	btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device In the case of the seed device, the fsid can be different from the mounted sprout fsid. The userland has to read the device superblock to know the fsid but, that idea fails if the device is missing. So add a sysfs interface devinfo/<devid>/fsid to show the fsid of the device. For example: $ cd /sys/fs/btrfs/b10b02a5-f9de-4276-b9e8-2bfd09a578a8 $ cat devinfo/1/fsid c44d771f-639d-4df3-99ec-5bc7ad2af93b $ cat devinfo/3/fsid b10b02a5-f9de-4276-b9e8-2bfd09a578a8 Though it's related to seeding, the name of the sysfs file is plain fsid as it matches what blkid says. A path to the device's fsid will aid scripting. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Josef Bacik	c18e323564	btrfs: reserve extra space for the free space tree Filipe reported a problem where sometimes he'd get an ENOSPC abort when running delayed refs with generic/619 and the free space tree enabled. This is partly because we do not reserve space for modifying the free space tree, nor do we have a block rsv associated with that tree. The delayed_refs_rsv tracks the amount of space required to run delayed refs. This means 1 modification means 1 change to the extent root. With the free space tree this turns into 2 changes, because modifying 1 extent means updating the extent tree and potentially updating the free space tree to either remove that entry or add the free space. Thus if we have the FST enabled, simply double the reservation size for our modification. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Josef Bacik	9506f95382	btrfs: include the free space tree in the global rsv minimum calculation Filipe reported a problem where generic/619 was failing with an ENOSPC abort while running delayed refs, like the following BTRFS: Transaction aborted (error -28) WARNING: CPU: 3 PID: 522920 at fs/btrfs/free-space-tree.c:1049 add_to_free_space_tree+0xe5/0x110 [btrfs] CPU: 3 PID: 522920 Comm: kworker/u16:19 Tainted: G W 5.16.0-rc2-btrfs-next-106 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] RIP: 0010:add_to_free_space_tree+0xe5/0x110 [btrfs] RSP: 0000:ffffa65087fb7b20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff9131eeaa RDI: 00000000ffffffff RBP: ffff8d62e26481b8 R08: ffffffff9ad97ce0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffe4 R13: ffff8d61c25fe688 R14: ffff8d61ebd88800 R15: ffff8d61ebd88a90 FS: 0000000000000000(0000) GS:ffff8d64ed400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa46a8b1000 CR3: 0000000148d18003 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __btrfs_free_extent+0x516/0x950 [btrfs] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs] btrfs_run_delayed_refs+0x86/0x210 [btrfs] flush_space+0x403/0x630 [btrfs] ? call_rcu_tasks_generic+0x50/0x80 ? lock_release+0x223/0x4a0 ? btrfs_get_alloc_profile+0xb5/0x290 [btrfs] ? do_raw_spin_unlock+0x4b/0xa0 btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs] process_one_work+0x24c/0x5b0 worker_thread+0x55/0x3c0 ? process_one_work+0x5b0/0x5b0 kthread+0x17c/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 There's a couple of reasons for this, but in generic/619's case the largest reason is because it is a very small file system, ad we do not reserve enough space for the global reserve. With the free space tree we now have the free space tree that we need to modify when running delayed refs. This means we need the global reserve to take this into account when it calculates the minimum size it needs to be. This is especially important for very small file systems. Fix this by adjusting the minimum global block rsv size math to include the size of the free space tree when calculating the size. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Qu Wenruo	c9d328c0c4	btrfs: scrub: merge SCRUB_PAGES_PER_RD_BIO and SCRUB_PAGES_PER_WR_BIO These two values were introduced in commit `ff023aac31` ("Btrfs: add code to scrub to copy read data to another disk") as an optimization. But the truth is, block layer scheduler can do whatever it wants to merge/split bios to improve performance. Doing such "optimization" is not really going to affect much, especially considering how good current block layer optimizations are doing. Remove such old and immature optimization from our code. Since we're here, also change BUG_ON()s using these two macros to use ASSERT()s. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00
Qu Wenruo	0bb3acdc48	btrfs: update SCRUB_MAX_PAGES_PER_BLOCK Use BTRFS_MAX_METADATA_BLOCKSIZE and SZ_4K (minimal sectorsize) to calculate this value. And remove one stale comment on the value, in fact with recent subpage support, BTRFS_MAX_METADATA_BLOCKSIZE * PAGE_SIZE is already beyond BTRFS_STRIPE_LEN, just we don't use the full page. Also since we're here, update the BUG_ON() related to SCRUB_MAX_PAGES_PER_BLOCK to ASSERT(). As those ASSERT() are really only for developers to catch early obvious bugs, not to let end users suffer. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-01-07 14:18:25 +01:00

... 5 6 7 8 9 ...

1067029 commits