Commit graph

1067029 commits

Author SHA1 Message Date
Jakub Kicinski
44ea62813f
spi: don't include ptp_clock_kernel.h in spi.h
Commit b42faeee71 ("spi: Add a PTP system timestamp
to the transfer structure") added an include of ptp_clock_kernel.h
to spi.h for struct ptp_system_timestamp but a forward declaration
is enough. Let's use that to limit the number of objects we have
to rebuild every time we touch networking headers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210904013140.2377609-1-kuba@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 17:14:30 +00:00
Fabio Estevam
530792efa6
regmap: Call regmap_debugfs_exit() prior to _init()
Since commit cffa4b2122 ("regmap: debugfs: Fix a memory leak when
calling regmap_attach_dev"), the following debugfs error is seen
on i.MX boards:

debugfs: Directory 'dummy-iomuxc-gpr@20e0000' with parent 'regmap' already present!

In the attempt to fix the memory leak, the above commit added a NULL check
for map->debugfs_name. For the first debufs entry, map->debugfs_name is NULL
and then the new name is allocated via kasprintf().

For the second debugfs entry, map->debugfs_name() is no longer NULL, so
it will keep using the old entry name and the duplicate name error is seen.

Quoting Mark Brown:

"That means that if the device gets freed we'll end up with the old debugfs
file hanging around pointing at nothing.
...
To be more explicit this means we need a call to regmap_debugfs_exit()
which will clean up all the existing debugfs stuff before we loose
references to it."

Call regmap_debugfs_exit() prior to regmap_debugfs_init() to fix
the problem.

Tested on i.MX6Q and i.MX6SX boards.

Fixes: cffa4b2122 ("regmap: debugfs: Fix a memory leak when calling regmap_attach_dev")
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Link: https://lore.kernel.org/r/20220107163307.335404-1-festevam@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 17:14:29 +00:00
Jason Wang
5322c68e58 iavf: remove an unneeded variable
The variable `ret_code' used for returning is never changed in function
`iavf_shutdown_adminq'. So that it can be removed and just return its
initial value 0 at the end of `iavf_shutdown_adminq' function.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Yang Li
a127adf2fc i40e: remove variables set but not used
The code that uses variables pe_cntx_size and pe_filt_size
has been removed, so they should be removed as well.

Eliminate the following clang warnings:
drivers/net/ethernet/intel/i40e/i40e_common.c:4139:20:
warning: variable 'pe_filt_size' set but not used.
drivers/net/ethernet/intel/i40e/i40e_common.c:4139:6:
warning: variable 'pe_cntx_size' set but not used.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Mateusz Palczewski
17b33d4319 i40e: Remove non-inclusive language
Remove non-inclusive language from the driver.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Mateusz Palczewski
9c83ca8a63 i40e: Update FW API version
Update FW API versions to the newest supported NVM images.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Jedrzej Jagielski
ef39584ddb i40e: Minimize amount of busy-waiting during AQ send
The i40e_asq_send_command will now use a non blocking usleep_range if
possible (non-atomic context), instead of busy-waiting udelay. The
usleep_range function uses hrtimers to provide better performance and
removes the negative impact of busy-waiting in time-critical
environments.

1. Rename i40e_asq_send_command to i40e_asq_send_command_atomic
   and add 5th parameter to inform if called from an atomic context.
   Call inside usleep_range (if non-atomic) or udelay (if atomic).

2. Change i40e_asq_send_command to invoke
   i40e_asq_send_command_atomic(..., false).

3. Change two functions:
    - i40e_aq_set_vsi_uc_promisc_on_vlan
    - i40e_aq_set_vsi_mc_promisc_on_vlan
   to explicitly use i40e_asq_send_command_atomic(..., true)
   instead of i40e_asq_send_command, as they use spinlocks and do some
   work in an atomic context.
   All other calls to i40e_asq_send_command remain unchanged.

Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Nikunj A Dadhania
fffb532378 KVM: x86: Check for rmaps allocation
With TDP MMU being the default now, access to mmu_rmaps_stat debugfs
file causes following oops:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 3185 Comm: cat Not tainted 5.16.0-rc4+ #204
RIP: 0010:pte_list_count+0x6/0x40
 Call Trace:
  <TASK>
  ? kvm_mmu_rmaps_stat_show+0x15e/0x320
  seq_read_iter+0x126/0x4b0
  ? aa_file_perm+0x124/0x490
  seq_read+0xf5/0x140
  full_proxy_read+0x5c/0x80
  vfs_read+0x9f/0x1a0
  ksys_read+0x67/0xe0
  __x64_sys_read+0x19/0x20
  do_syscall_64+0x3b/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fca6fc13912

Return early when rmaps are not present.

Reported-by: Vasant Hegde <vasant.hegde@amd.com>
Tested-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220105040337.4234-1-nikunj@amd.com>
Cc: stable@vger.kernel.org
Fixes: 3bcd0662d6 ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 12:04:01 -05:00
Karen Sornek
cfb1d572c9 i40e: Add ensurance of MacVlan resources for every trusted VF
Trusted VF can use up every resource available, leaving nothing
to other trusted VFs.
Introduce define, which calculates MacVlan resources available based
on maximum available MacVlan resources, bare minimum for each VF and
number of currently allocated VFs.

Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:03:44 -08:00
Wanpeng Li
597cb7968c KVM: SEV: Mark nested locking of kvm->lock
Both source and dest vms' kvm->locks are held in sev_lock_two_vms.
Mark one with a different subtype to avoid false positives from lockdep.

Fixes: c9d61dcb0b (KVM: SEV: accept signals in sev_lock_two_vms)
Reported-by: Yiru Xu <xyru1999@gmail.com>
Tested-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1641364863-26331-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07 12:01:55 -05:00
Dave Hansen
2056e2989b x86/sgx: Fix NULL pointer dereference on non-SGX systems
== Problem ==

Nathan Chancellor reported an oops when aceessing the
'sgx_total_bytes' sysfs file:

	https://lore.kernel.org/all/YbzhBrimHGGpddDM@archlinux-ax161/

The sysfs output code accesses the sgx_numa_nodes[] array
unconditionally.  However, this array is allocated during SGX
initialization, which only occurs on systems where SGX is
supported.

If the sysfs file is accessed on systems without SGX support,
sgx_numa_nodes[] is NULL and an oops occurs.

== Solution ==

To fix this, hide the entire nodeX/x86/ attribute group on
systems without SGX support using the ->is_visible attribute
group callback.

Unfortunately, SGX is initialized via a device_initcall() which
occurs _after_ the ->is_visible() callback.  Instead of moving
SGX initialization earlier, call sysfs_update_group() during
SGX initialization to update the group visiblility.

This update requires moving the SGX sysfs code earlier in
sgx/main.c.  There are no code changes other than the addition of
arch_update_sysfs_visibility() and a minor whitespace fixup to
arch_node_attr_is_visible() which checkpatch caught.

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-sgx@vger.kernel.org
Cc: x86@kernel.org
Fixes: 50468e4313 ("x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lkml.kernel.org/r/20220104171527.5E8416A8@davehans-spike.ostc.intel.com
2022-01-07 08:47:23 -08:00
Kevin Bracey
c25af830ab sch_cake: revise Diffserv docs
Documentation incorrectly stated that CS1 is equivalent to LE for
diffserv8. But when LE was added to the table, CS1 was pushed into tin
1, leaving only LE in tin 0.

Also "TOS1" no longer exists, as that is the same codepoint as LE.

Make other tweaks properly distinguishing codepoints from classes and
putting current Diffserve codepoints ahead of legacy ones.

Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220106215637.3132391-1-kevin@bracey.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 08:41:29 -08:00
Mauro Carvalho Chehab
87d6576ddf scripts: sphinx-pre-install: Fix ctex support on Debian
The name of the package with ctexhook.sty is different on
Debian/Ubuntu.

Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Tested-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/r/63882425609a2820fac78f5e94620abeb7ed5f6f.1641429634.git.mchehab@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07 09:33:13 -07:00
Jonathan Corbet
db67eb748e docs: discourage use of list tables
Our documentation encourages the use of list-table formats, but that advice
runs counter to the objective of keeping the plain-text documentation as
useful and readable as possible.  Turn that advice around the other way so
that people don't keep adding these tables.

Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07 09:33:13 -07:00
Thorsten Leemhuis
bf33a9d42d docs: 5.Posting.rst: describe Fixes: and Link: tags
Explain Fixes: and Link: tags in Documentation/process/5.Posting.rst,
which are missing in this file for unknown reasons and only described in
Documentation/process/submitting-patches.rst.

Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
CC: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://lore.kernel.org/r/c4a5f5e25fa84b26fd383bba6eafde4ab57c9de7.1641314856.git.linux@leemhuis.info
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07 09:33:13 -07:00
Christian Löhle
689d8014d9 Documentation: kgdb: Replace deprecated remotebaud
Using set remotebaud to set the baud rate was deprecated in
gdb-7.7 and completely removed from the command parser in gdb-7.8
(released in 2014). Adopt set serial baud instead.

Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/4050689967ed46baaa3bfadda53a0e73@hyperstone.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07 09:33:13 -07:00
James Clark
7cc4c09269 docs: automarkup.py: Fix invalid HTML link output and broken URI fragments
Since commit d18b01789a ("docs: Add automatic cross-reference for
documentation pages"), references that were already explicitly defined
with "ref:" and referred to other pages with a path have been doubled.
This is reported as the following error by Firefox:

  Start tag "a" seen but an element of the same type was already open.
  End tag "a" violates nesting rules.

As well as the invalid HTML, this also obscures the URI fragment links
to subsections because the second link overrides the first. For example
on the page admin-guide/hw-vuln/mds.html the last link should be to the
"Default Mitigations" subsection using a # URI fragment:

  admin-guide/hw-vuln/l1tf.html#default-mitigations

But it is obsured by a second link to the whole page:

  admin-guide/hw-vuln/l1tf.html

The full HTML with the double <a> tags looks like this:

  <a class="reference internal" href="l1tf.html#default-mitigations">
    <span class="std std-ref">
      <a class="reference internal" href="l1tf.html">
        <span class="doc">L1TF - L1 Terminal Fault</span>
      </a>
    </span>
  </a>

After this commit, there is only a single link:

  <a class="reference internal" href="l1tf.html#default-mitigations">
    <span class="std std-ref">Documentation/admin-guide/hw-vuln//l1tf.rst</span>
  </a>

Now that the second link is removed, the browser correctly jumps to the
default-mitigations subsection when clicking the link.

The fix is to check that nodes in the document to be modified are not
already references. A reference is counted as any text that is a
descendant of a reference type node. Only plain text should be converted
to new references, otherwise the doubling occurs.

Testing
=======

 * Test that the build stdout is the same (ignoring ordering), and that
   no new warnings are printed.

 * Diff all .html files and check that the only modifications occur
   to the bad double links.

 * The auto linking of bare references to pages without "ref:" is still
   working.

Fixes: d18b01789a ("docs: Add automatic cross-reference for documentation pages")
Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net>
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20220105143640.330602-2-james.clark@arm.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-01-07 09:32:58 -07:00
Dan Carpenter
dc35616e6c netrom: fix api breakage in nr_setsockopt()
This needs to copy an unsigned int from user space instead of a long to
avoid breaking user space with an API change.

I have updated all the integer overflow checks from ULONG to UINT as
well.  This is a slight API change but I do not expect it to affect
anything in real life.

Fixes: 3087a6f36e ("netrom: fix copying in user data in nr_setsockopt")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:11:05 +00:00
Dan Carpenter
9371937092 ax25: uninitialized variable in ax25_setsockopt()
The "opt" variable is unsigned long but we only copy 4 bytes from
the user so the lower 4 bytes are uninitialized.

I have changed the integer overflow checks from ULONG to UINT as well.
This is a slight API change but I don't expect it to break anything.

Fixes: a7b75c5a8c ("net: pass a sockptr_t into ->setsockopt")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:10:26 +00:00
David S. Miller
b69c5b5886 Merge branch 'octeontx2-ptp-bugs'
Subbaraya Sundeep says:

====================
octeontx2: Fix PTP bugs

This patchset addresses two problems found when using
ptp.
Patch 1 - Increases the refcount of ptp device before use
which was missing and it lead to refcount increment after use
bug when module is loaded and unloaded couple of times.
Patch 2 - PTP resources allocated by VF are not being freed
during VF teardown. This patch fixes that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:04:19 +00:00
Rakesh Babu Saladi
eabd0f88b0 octeontx2-nicvf: Free VF PTP resources.
When a VF is removed respective PTP resources are not
being freed currently. This patch fixes it.

Fixes: 43510ef4dd ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF")
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:04:19 +00:00
Subbaraya Sundeep
93440f4888 octeontx2-af: Increment ptp refcount before use
Before using the ptp pci device by AF driver increment
the reference count of it.

Fixes: a8b90c9d26 ("octeontx2-af: Add PTP device id for CN10K and 95O silcons")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:04:19 +00:00
Miaoqian Lin
69c1b87516
spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
If the probe fails, we should use pm_runtime_disable() to balance
pm_runtime_enable().
Add missing pm_runtime_disable() for meson_spifc_probe.

Fixes: c3e4bc5434 ("spi: meson: Add support for Amlogic Meson SPIFC")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220107075424.7774-1-linmq006@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 13:36:38 +00:00
Qinghua Jin
c8c9cb6d9f
spi: atmel: Fix typo
Change 'actualy' to 'actually'

Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com>
Link: https://lore.kernel.org/r/20220107024631.396862-1-qhjin.dev@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 13:36:37 +00:00
Watson Chow
bfff546aae
regulator: Add MAX20086-MAX20089 driver
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add a
driver that supports controlling the outputs individually. Additional
features, such as overcurrent detection, may be added later if needed.

Signed-off-by: Watson Chow <watson.chow@avnet.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://lore.kernel.org/r/20220106224350.16957-3-laurent.pinchart+renesas@ideasonboard.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 13:36:35 +00:00
Laurent Pinchart
764aaa4e03
dt-bindings: regulators: Add bindings for Maxim MAX20086-MAX20089
The MAX20086-MAX20089 are dual/quad power protectors for cameras. Add
corresponding DT bindings.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://lore.kernel.org/r/20220106224350.16957-2-laurent.pinchart+renesas@ideasonboard.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2022-01-07 13:36:34 +00:00
Qu Wenruo
36c86a9e1b btrfs: output more debug messages for uncommitted transaction
Print extra information about how many dirty bytes an uncommitted
has at the end of mount.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Filipe Manana
c2f822635d btrfs: respect the max size in the header when activating swap file
If we extended the size of a swapfile after its header was created (by the
mkswap utility) and then try to activate it, we will map the entire file
when activating the swap file, instead of limiting to the max size defined
in the swap file's header.

Currently test case generic/643 from fstests fails because we do not
respect that size limit defined in the swap file's header.

So fix this by not mapping file ranges beyond the max size defined in the
swap header.

This is the same type of bug that iomap used to have, and was fixed in
commit 36ca7943ac ("mm/swap: consider max pages in
iomap_swapfile_add_extent").

Fixes: ed46ff3d42 ("Btrfs: support swap files")
CC: stable@vger.kernel.org # 5.4+
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Yang Li
be8d1a2ab9 btrfs: fix argument list that the kdoc format and script verified
The warnings were found by running scripts/kernel-doc, which is
caused by using 'make W=1'.

fs/btrfs/extent_io.c:3210: warning: Function parameter or member
'bio_ctrl' not described in 'btrfs_bio_add_page'
fs/btrfs/extent_io.c:3210: warning: Excess function parameter 'bio'
description in 'btrfs_bio_add_page'
fs/btrfs/extent_io.c:3210: warning: Excess function parameter
'prev_bio_flags' description in 'btrfs_bio_add_page'
fs/btrfs/space-info.c:1602: warning: Excess function parameter 'root'
description in 'btrfs_reserve_metadata_bytes'
fs/btrfs/space-info.c:1602: warning: Function parameter or member
'fs_info' not described in 'btrfs_reserve_metadata_bytes'

Note: this is fixing only the warnings regarding parameter list, the
first line is not strictly conforming to the kdoc format as the btrfs
codebase does not stick to that and keeps the first line more free form
(because it's only for internal use).

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Su Yue
4a9e803e5b btrfs: remove unnecessary parameter type from compression_decompress_bio
btrfs_decompress_bio, the only caller of compression_decompress_bio gets
type from @cb and passes it to compression_decompress_bio.
However, compression_decompress_bio can get compression type directly
from @cb.

So remove the parameter and access it through @cb.  No functional
change.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Qu Wenruo
856e47946c btrfs: selftests: dump extent io tree if extent-io-tree test failed
When code modifying extent-io-tree get modified and got that selftest
failed, it can take some time to pin down the cause.

To make it easier to expose the problem, dump the extent io tree if the
selftest failed.

This can save developers debug time, especially since the selftest we
can not use the trace events, thus have to manually add debug trace
points.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Qu Wenruo
2ae8ae3d3d btrfs: scrub: cleanup the argument list of scrub_stripe()
The argument list of btrfs_stripe() has similar problems of
scrub_chunk():

- Duplicated and ambiguous @base argument
  Can be fetched from btrfs_block_group::bg.

- Ambiguous argument @length
  It's again device extent length

- Ambiguous argument @num
  The instinctive guess would be mirror number, but in fact it's stripe
  index.

Fix it by:

- Remove @base parameter

- Rename @length to @dev_extent_len

- Rename @num to @stripe_index

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:27 +01:00
Qu Wenruo
d04fbe19ae btrfs: scrub: cleanup the argument list of scrub_chunk()
The argument list of scrub_chunk() has the following problems:

- Duplicated @chunk_offset
  It is the same as btrfs_block_group::start.

- Confusing @length
  The most instinctive guess is chunk length, and one may want to delete
  it, but the truth is, it's the device extent length.

Fix this by:

- Remove @chunk_offset
  Use btrfs_block_group::start instead.

- Rename @length to @dev_extent_len
  Also rename the caller to remove the ambiguous naming.

- Rename @cache to @bg
  The "_cache" suffix for btrfs_block_group has been removed for a while.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Qu Wenruo
f26c923860 btrfs: remove reada infrastructure
Currently there is only one user for btrfs metadata readahead, and
that's scrub.

But even for the single user, it's not providing the correct
functionality it needs, as scrub needs reada for commit root, which
current readahead can't provide. (Although it's pretty easy to add such
feature).

Despite this, there are some extra problems related to metadata
readahead:

- Duplicated feature with btrfs_path::reada

- Partly duplicated feature of btrfs_fs_info::buffer_radix
  Btrfs already caches its metadata in buffer_radix, while readahead
  tries to read the tree block no matter if it's already cached.

- Poor layer separation
  Metadata readahead works kinda at device level.
  This is definitely not the correct layer it should be, since metadata
  is at btrfs logical address space, it should not bother device at all.

  This brings extra chance for bugs to sneak in, while brings
  unnecessary complexity.

- Dead code
  In the very beginning of scrub.c we have #undef DEBUG, rendering all
  the debug related code useless and unable to test.

Thus here I purpose to remove the metadata readahead mechanism
completely.

[BENCHMARK]
There is a full benchmark for the scrub performance difference using the
old btrfs_reada_add() and btrfs_path::reada.

For the worst case (no dirty metadata, slow HDD), there could be a 5%
performance drop for scrub.
For other cases (even SATA SSD), there is no distinguishable performance
difference.

The number is reported scrub speed, in MiB/s.
The resolution is limited by the reported duration, which only has a
resolution of 1 second.

	Old		New		Diff
SSD	455.3		466.332		+2.42%
HDD	103.927 	98.012		-5.69%

Comprehensive test methodology is in the cover letter of the patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Qu Wenruo
dcf62b204c btrfs: scrub: use btrfs_path::reada for extent tree readahead
For scrub, we trigger two readaheads for two trees, extent tree to get
where to scrub, and csum tree to get the data checksum.

For csum tree we already trigger readahead in
btrfs_lookup_csums_range(), by setting path->reada.
But for extent tree we don't have any path based readahead.

Add the readahead for extent tree as well, so we can later remove the
btrfs_reada_add() based readahead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Qu Wenruo
2522dbe86b btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()
In function scrub_stripe() we allocated two btrfs_path's, one @path for
extent tree search and another @ppath for full stripe extent tree search
for RAID56.

This is totally umncessary, as the @ppath usage is completely inside
scrub_raid56_parity(), thus we can move the path allocation into
scrub_raid56_parity() completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Nikolay Borisov
c122799643 btrfs: refactor unlock_up
The purpose of this function is to unlock all nodes in a btrfs path
which are above 'lowest_unlock' and whose slot used is different than 0.
As such it used slightly awkward structure of 'if' as well as somewhat
cryptic "no_skip" control variable which denotes whether we should
check the current level of skipability or no.

This patch does the following (cosmetic) refactorings:

* Renames 'no_skip' to 'check_skip' and makes it a boolean. This
  variable controls whether we are below the lowest_unlock/skip_level
  levels.

* Consolidates the 2 conditions which warrant checking whether the
  current level should be skipped under 1 common if (check_skip) branch,
  this increase indentation level but is not critical.

* Consolidates the 'skip_level < i && i >= lowest_unlock' and
  'i >= lowest_unlock && i > skip_level' condition into a common branch
  since those are identical.

* Eliminates the local extent_buffer variable as in this case it doesn't
  bring anything to function readability.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Filipe Manana
1b58ae0e4d btrfs: skip transaction commit after failure to create subvolume
At ioctl.c:create_subvol(), when we fail to create a subvolume we always
commit the transaction. In most cases this is a no-op, since all the error
paths, except for one, abort the transaction - the only exception is when
we fail to insert the new root item into the root tree, in that case we
don't abort the transaction because we didn't do anything that is
irreversible - however we end up committing the transaction which although
is not a functional problem, it adds unnecessary rotation of the backup
roots in the superblock and unnecessary work.

So change that to commit a transaction only when no error happened,
otherwise just call btrfs_end_transaction() to release our reference on
the transaction.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Naohiro Aota
82187d2ecd btrfs: zoned: fix chunk allocation condition for zoned allocator
The ZNS specification defines a limit on the number of "active"
zones. That limit impose us to limit the number of block groups which
can be used for an allocation at the same time. Not to exceed the
limit, we reuse the existing active block groups as much as possible
when we can't activate any other zones without sacrificing an already
activated block group in commit a85f05e59b ("btrfs: zoned: avoid
chunk allocation if active block group has enough space").

However, the check is wrong in two ways. First, it checks the
condition for every raid index (ffe_ctl->index). Even if it reaches
the condition and "ffe_ctl->max_extent_size >=
ffe_ctl->min_alloc_size" is met, there can be other block groups
having enough space to hold ffe_ctl->num_bytes. (Actually, this won't
happen in the current zoned code as it only supports SINGLE
profile. But, it can happen once it enables other RAID types.)

Second, it checks the active zone availability depending on the
raid index. The raid index is just an index for
space_info->block_groups, so it has nothing to do with chunk allocation.

These mistakes are causing a faulty allocation in a certain
situation. Consider we are running zoned btrfs on a device whose
max_active_zone == 0 (no limit). And, suppose no block group have a
room to fit ffe_ctl->num_bytes but some room to meet
ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >=
min_alloc_size).

In this situation, the following occur:

- With SINGLE raid_index, it reaches the chunk allocation checking
  code
- The check returns true because we can activate a new zone (no limit)
- But, before allocating the chunk, it iterates to the next raid index
  (RAID5)
- Since there are no RAID5 block groups on zoned mode, it again
  reaches the check code
- The check returns false because of btrfs_can_activate_zone()'s "if
  (raid_index != BTRFS_RAID_SINGLE)" part
- That results in returning -ENOSPC without allocating a new chunk

As a result, we end up hitting -ENOSPC too early.

Move the check to the right place in the can_allocate_chunk() hook,
and do the active zone check depending on the allocation flag, not on
the raid index.

CC: stable@vger.kernel.org # 5.16
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Naohiro Aota
50475cd577 btrfs: add extent allocator hook to decide to allocate chunk or not
Introduce a new hook for an extent allocator policy. With the new
hook, a policy can decide to allocate a new block group or not. If
not, it will return -ENOSPC, so btrfs_reserve_extent() will cut the
allocation size in half and retry the allocation if min_alloc_size is
large enough.

The hook has a place holder and will be replaced with the real
implementation in the next patch.

CC: stable@vger.kernel.org # 5.16
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Naohiro Aota
1ada69f61c btrfs: zoned: unset dedicated block group on allocation failure
Allocating an extent from a block group can fail for various reasons.
When an allocation from a dedicated block group (for tree-log or
relocation data) fails, we need to unregister it as a dedicated one so
that we can allocate a new block group for the dedicated one.

However, we are returning early when the block group in case it is
read-only, fully used, or not be able to activate the zone. As a result,
we keep the non-usable block group as a dedicated one, leading to
further allocation failure. With many block groups, the allocator will
iterate hopeless loop to find a free extent, results in a hung task.

Fix the issue by delaying the return and doing the proper cleanups.

CC: stable@vger.kernel.org # 5.16
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Johannes Thumshirn
7367271000 btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned
REQ_OP_ZONE_APPEND can only work on zoned devices, so it is redundant to
check if the filesystem is zoned when REQ_OP_ZONE_APPEND is set as the
bio's bio_op.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Johannes Thumshirn
554aed7da2 btrfs: zoned: sink zone check into btrfs_repair_one_zone
Sink zone check into btrfs_repair_one_zone() so we don't need to do it
in all callers.

Also as btrfs_repair_one_zone() doesn't return a sensible error, make it
a boolean function and return false in case it got called on a non-zoned
filesystem and true on a zoned filesystem.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:26 +01:00
Johannes Thumshirn
8fdf54fe69 btrfs: zoned: simplify btrfs_check_meta_write_pointer
btrfs_check_meta_write_pointer() will always be called with a NULL
'cache_ret' argument.

As there's no need to check if we have a valid block_group passed in
remove these checks.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Johannes Thumshirn
869f4cdc73 btrfs: zoned: encapsulate inode locking for zoned relocation
Encapsulate the inode lock needed for serializing the data relocation
writes on a zoned filesystem into a helper.

This streamlines the code reading flow and hides special casing for
zoned filesystems.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Anand Jain
a26d60dedf btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device
In the case of the seed device, the fsid can be different from the mounted
sprout fsid.  The userland has to read the device superblock to know the
fsid but, that idea fails if the device is missing. So add a sysfs
interface devinfo/<devid>/fsid to show the fsid of the device.

For example:
  $ cd /sys/fs/btrfs/b10b02a5-f9de-4276-b9e8-2bfd09a578a8

  $ cat devinfo/1/fsid
  c44d771f-639d-4df3-99ec-5bc7ad2af93b
  $ cat  devinfo/3/fsid
  b10b02a5-f9de-4276-b9e8-2bfd09a578a8

Though it's related to seeding, the name of the sysfs file is plain fsid as it
matches what blkid says.  A path to the device's fsid will aid scripting.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Josef Bacik
c18e323564 btrfs: reserve extra space for the free space tree
Filipe reported a problem where sometimes he'd get an ENOSPC abort when
running delayed refs with generic/619 and the free space tree enabled.
This is partly because we do not reserve space for modifying the free
space tree, nor do we have a block rsv associated with that tree.

The delayed_refs_rsv tracks the amount of space required to run delayed
refs.  This means 1 modification means 1 change to the extent root.
With the free space tree this turns into 2 changes, because modifying 1
extent means updating the extent tree and potentially updating the free
space tree to either remove that entry or add the free space.  Thus if
we have the FST enabled, simply double the reservation size for our
modification.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Josef Bacik
9506f95382 btrfs: include the free space tree in the global rsv minimum calculation
Filipe reported a problem where generic/619 was failing with an ENOSPC
abort while running delayed refs, like the following

  BTRFS: Transaction aborted (error -28)
  WARNING: CPU: 3 PID: 522920 at fs/btrfs/free-space-tree.c:1049 add_to_free_space_tree+0xe5/0x110 [btrfs]
  CPU: 3 PID: 522920 Comm: kworker/u16:19 Tainted: G        W         5.16.0-rc2-btrfs-next-106 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
  RIP: 0010:add_to_free_space_tree+0xe5/0x110 [btrfs]
  RSP: 0000:ffffa65087fb7b20 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffff9131eeaa RDI: 00000000ffffffff
  RBP: ffff8d62e26481b8 R08: ffffffff9ad97ce0 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffe4
  R13: ffff8d61c25fe688 R14: ffff8d61ebd88800 R15: ffff8d61ebd88a90
  FS:  0000000000000000(0000) GS:ffff8d64ed400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fa46a8b1000 CR3: 0000000148d18003 CR4: 0000000000370ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   __btrfs_free_extent+0x516/0x950 [btrfs]
   __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs]
   btrfs_run_delayed_refs+0x86/0x210 [btrfs]
   flush_space+0x403/0x630 [btrfs]
   ? call_rcu_tasks_generic+0x50/0x80
   ? lock_release+0x223/0x4a0
   ? btrfs_get_alloc_profile+0xb5/0x290 [btrfs]
   ? do_raw_spin_unlock+0x4b/0xa0
   btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs]
   process_one_work+0x24c/0x5b0
   worker_thread+0x55/0x3c0
   ? process_one_work+0x5b0/0x5b0
   kthread+0x17c/0x1a0
   ? set_kthread_struct+0x40/0x40
   ret_from_fork+0x22/0x30

There's a couple of reasons for this, but in generic/619's case the
largest reason is because it is a very small file system, ad we do not
reserve enough space for the global reserve.

With the free space tree we now have the free space tree that we need to
modify when running delayed refs.  This means we need the global reserve
to take this into account when it calculates the minimum size it needs
to be.  This is especially important for very small file systems.

Fix this by adjusting the minimum global block rsv size math to include
the size of the free space tree when calculating the size.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Qu Wenruo
c9d328c0c4 btrfs: scrub: merge SCRUB_PAGES_PER_RD_BIO and SCRUB_PAGES_PER_WR_BIO
These two values were introduced in commit ff023aac31 ("Btrfs: add code
to scrub to copy read data to another disk") as an optimization.

But the truth is, block layer scheduler can do whatever it wants to
merge/split bios to improve performance.

Doing such "optimization" is not really going to affect much, especially
considering how good current block layer optimizations are doing.
Remove such old and immature optimization from our code.

Since we're here, also change BUG_ON()s using these two macros to use
ASSERT()s.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00
Qu Wenruo
0bb3acdc48 btrfs: update SCRUB_MAX_PAGES_PER_BLOCK
Use BTRFS_MAX_METADATA_BLOCKSIZE and SZ_4K (minimal sectorsize) to
calculate this value.

And remove one stale comment on the value, in fact with recent subpage
support, BTRFS_MAX_METADATA_BLOCKSIZE * PAGE_SIZE is already beyond
BTRFS_STRIPE_LEN, just we don't use the full page.

Also since we're here, update the BUG_ON() related to
SCRUB_MAX_PAGES_PER_BLOCK to ASSERT().

As those ASSERT() are really only for developers to catch early obvious
bugs, not to let end users suffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07 14:18:25 +01:00