Commit graph

1105317 commits

Author SHA1 Message Date
Leon Romanovsky
6cd2126ac6 net/mlx5: Cleanup XFRM attributes struct
Remove everything that is not used or from mlx5_accel_esp_xfrm_attrs,
together with change type of spi to store proper type from the beginning.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:18 -07:00
Leon Romanovsky
1c4a59b9fa net/mlx5: Remove not-supported ICV length
mlx5 doesn't allow to configure any AEAD ICV length other than 128,
so remove the logic that configures other unsupported values.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:17 -07:00
Leon Romanovsky
effbe26751 net/mlx5: Simplify IPsec capabilities logic
Reduce number of hard-coded IPsec capabilities by making sure
that mlx5_ipsec_device_caps() sets only supported bits.

As part of this change, remove _ACCEL_ notations from the capabilities
names as they represent IPsec-capable device, so it is aligned with
MLX5_CAP_IPSEC() macro. And prepare the code to IPsec full offload mode.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:17 -07:00
Leon Romanovsky
a8444b0bdd net/mlx5: Don't advertise IPsec netdev support for non-IPsec device
Device that lacks proper IPsec capabilities won't pass mlx5e_ipsec_init()
later, so no need to advertise HW netdev offload support for something that
isn't going to work anyway.

Fixes: 8ad893e516 ("net/mlx5e: Remove dependency in IPsec initialization flows")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:17 -07:00
Leon Romanovsky
b7242ffc56 net/mlx5: Make sure that no dangling IPsec FS pointers exist
The IPsec FS code was implemented with anti-pattern there failures
in create functions left the system with dangling pointers that were
cleaned in global routines.

The less error prone approach is to make sure that failed function
cleans everything internally.

As part of this change, we remove the batch of one liners and rewrite
get/put functions to remove ambiguity.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:17 -07:00
Leon Romanovsky
82f7bdba37 net/mlx5: Clean IPsec FS add/delete rules
Reuse existing struct to pass parameters instead of open code them.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:16 -07:00
Leon Romanovsky
b73e67287b net/mlx5: Simplify HW context interfaces by using SA entry
SA context logic used multiple structures to store same data
over and over. By simplifying the SA context interfaces, we
can remove extra structs.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:16 -07:00
Leon Romanovsky
a534e24d72 net/mlx5: Remove indirections from esp functions
This change cleanups the mlx5 esp interface.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:16 -07:00
Leon Romanovsky
c6e3b421c7 net/mlx5: Merge various control path IPsec headers into one file
The mlx5 IPsec code has logical separation between code that operates
with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering
logic (ipsec_fs.c) and data path (ipsec_rxtx.c).

Such separation makes sense for C-files, but isn't needed at all for
H-files as they are included in batch anyway.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Leon Romanovsky
2ea36e2e4a net/mlx5: Remove useless validity check
All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs()
function that ensure validity of attributes. There is no need to recheck
them again.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Leon Romanovsky
c674df973a net/mlx5: Store IPsec ESN update work in XFRM state
mlx5 IPsec code updated ESN through workqueue with allocation calls
in the data path, which can be saved easily if the work is created
during XFRM state initialization routine.

The locking used later in the work didn't protect from anything because
change of HW context is possible during XFRM state add or delete only,
which can cancel work and make sure that it is not running.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:15 -07:00
Leon Romanovsky
a05a54694e net/mlx5: Reduce useless indirection in IPsec FS add/delete flows
There is no need in one-liners wrappers to call internal functions.
Let's remove them.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:14 -07:00
Leon Romanovsky
021a429bdb net/mlx5: Don't hide fallback to software IPsec in FS code
The XFRM code performs fallback to software IPsec if .xdo_dev_state_add()
returns -EOPNOTSUPP. This is what mlx5 did very deep in its stack trace,
despite have all the knowledge that IPsec is not going to work in very
early stage.

This is achieved by making sure that priv->ipsec pointer is valid for
fully working and supported hardware crypto IPsec engine.

In case, the hardware IPsec is not supported, the XFRM code will set NULL
to xso->dev and it will prevent from calls to various .xdo_dev_state_*()
callbacks.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:14 -07:00
Leon Romanovsky
9af1968ee1 net/mlx5: Check IPsec TX flow steering namespace in advance
Ensure that flow steering is usable as early as possible, to understand
if crypto IPsec is supported or not.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:14 -07:00
Leon Romanovsky
301e0be800 net/mlx5: Simplify IPsec flow steering init/cleanup functions
Remove multiple function wrappers to make sure that IPsec FS initialization
and cleanup functions present in one place to help with code readability.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03 22:59:12 -07:00
Manikanta Pubbisetty
f9eec4947a ath11k: Add support for targets without trustzone
Add the support to attach WCN6750 and map iommu domain
for targets which do not have the support of TrustZone.

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00573-QCAMSLSWPLZ-1
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.5.0.1-01100-QCAHKSWPL_SILICONZ-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-00192-QCAHKSWPL_SILICONZ-1

Signed-off-by: Manikanta Pubbisetty <quic_mpubbise@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220328062032.28881-1-quic_mpubbise@quicinc.com
2022-05-04 08:47:40 +03:00
Ping-Ke Shih
7ba49f4c68 rtw89: 8852c: add 8852ce to Makefile and Kconfig
This initial vesion is usable now. It can support STA, AP and monitor
modes, so we can add 8852ce to Kconfig and Makefile.

We are still working on some features, such as deep power save, and BT
coexistence. But, this version still can have a good WiFi-only performance
already, and will continue to fine tune power consumption.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-9-pkshih@realtek.com
2022-05-04 08:32:05 +03:00
Ping-Ke Shih
68bf56e3b0 rtw89: 8852c: fix warning of FIELD_PREP() mask type
To fix the compiler warning of clang with i386 config, but not complain
by gcc:

           __write_ctrl(R_AX_PWR_RATE_CTRL, B_AX_FORCE_PWR_BY_RATE_VALUE_MASK,
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/wireless/realtek/rtw89/rtw8852c.c:2621:13: note: expanded from macro '__write_ctrl'
           u32 _wrt = FIELD_PREP(__msk, _val);                     \
                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/bitfield.h:114:3: note: expanded from macro 'FIELD_PREP'
                   __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: ");    \
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/bitfield.h:71:53: note: expanded from macro '__BF_FIELD_CHECK'
                   BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) >     \
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:352:22: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:340:23: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler_types.h:332:9: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                         ^~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-8-pkshih@realtek.com
2022-05-04 08:32:05 +03:00
Ping-Ke Shih
55cf5b7e2d rtw89: 8852c: correct register definitions used by 8852c
First one could affect SER because of false alarm event. Second one can
affect spur elimination.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-7-pkshih@realtek.com
2022-05-04 08:32:05 +03:00
Ping-Ke Shih
62440fbefa rtw89: correct AID settings of beamformee
Without this fix, it would cause IOT issue due to AID mismatch.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-6-pkshih@realtek.com
2022-05-04 08:32:05 +03:00
Ping-Ke Shih
39a7652103 rtw89: ps: fine tune polling interval while changing low power mode
By experiments, it spends ~45/1090~2480us to enter/leave low power mode,
so the old polling interval 1000us can waste time. Use smaller polling
interval depends on experimental results to reduce the time to transition
state.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-5-pkshih@realtek.com
2022-05-04 08:32:04 +03:00
Ping-Ke Shih
78af3cc673 rtw89: 8852c: add basic and remaining chip_info
The chip_info include BT coexistence tables, size and number of hardware
components, and supported functions.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-4-pkshih@realtek.com
2022-05-04 08:32:04 +03:00
Ping-Ke Shih
e212d5d48d rtw89: 8852c: add chip_ops::bb_ctrl_btc_preagc
Add to configure BT share RX path and related settings.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-3-pkshih@realtek.com
2022-05-04 08:32:04 +03:00
Ping-Ke Shih
5309cd5ec9 rtw89: 8852c: rfk: get calibrated channels to notify firmware
The commit 16b44ed0ff ("rtw89: add RF H2C to notify firmware") is to
add firmware command, and this commit is to prepare the channels. Then,
firmware can get proper channels.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220503120001.79272-2-pkshih@realtek.com
2022-05-04 08:32:04 +03:00
Tetsuo Handa
eeff214dbf wfx: avoid flush_workqueue(system_highpri_wq) usage
Flushing system-wide workqueues is dangerous and will be forbidden.
Replace system_highpri_wq with per "struct wfx_dev" bh_wq.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/f15574a6-aba4-72bc-73af-26fdcdf9fb63@I-love.SAKURA.ne.jp
2022-05-04 08:29:00 +03:00
Allison Henderson
fd92000878 xfs: Set up infrastructure for log attribute replay
Currently attributes are modified directly across one or more
transactions. But they are not logged or replayed in the event of an
error. The goal of log attr replay is to enable logging and replaying
of attribute operations using the existing delayed operations
infrastructure.  This will later enable the attributes to become part of
larger multi part operations that also must first be recorded to the
log.  This is mostly of interest in the scheme of parent pointers which
would need to maintain an attribute containing parent inode information
any time an inode is moved, created, or removed.  Parent pointers would
then be of interest to any feature that would need to quickly derive an
inode path from the mount point. Online scrub, nfs lookups and fs grow
or shrink operations are all features that could take advantage of this.

This patch adds two new log item types for setting or removing
attributes as deferred operations.  The xfs_attri_log_item will log an
intent to set or remove an attribute.  The corresponding
xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
freed once the transaction is done.  Both log items use a generic
xfs_attr_log_format structure that contains the attribute name, value,
flags, inode, and an op_flag that indicates if the operations is a set
or remove.

[dchinner: added extra little bits needed for intent whiteouts]

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:41:02 +10:00
Allison Henderson
9a39cdabc1 xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process
During an attr rename operation, blocks are saved for later removal
as rmtblkno2. The rmtblkno is used in the case of needing to alloc
more blocks if not enough were available.  However, in the case
that no further blocks need to be added or removed, we can return as soon
as xfs_attr_node_addname completes, rather than rolling the transaction
with an -EAGAIN return.  This extra loop does not hurt anything right
now, but it will be a problem later when we get into log items because
we end up with an empty log transaction.  So, add a simple check to
cut out the unneeded iteration.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:40:02 +10:00
Allison Henderson
7b3ec2b20e xfs: Fix double unlock in defer capture code
The new deferred attr patch set uncovered a double unlock in the
recent port of the defer ops capture and continue code.  During log
recovery, we're allowed to hold buffers to a transaction that's being
used to replay an intent item.  When we capture the resources as part
of scheduling a continuation of an intent chain, we call xfs_buf_hold
to retain our reference to the buffer beyond the transaction commit,
but we do /not/ call xfs_trans_bhold to maintain the buffer lock.
This means that xfs_defer_ops_continue needs to relock the buffers
before xfs_defer_restore_resources joins then tothe new transaction.

Additionally, the buffers should not be passed back via the dres
structure since they need to remain locked unlike the inodes.  So
simply set dr_bufs to zero after populating the dres structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:39:02 +10:00
Dave Chinner
86810a9ebd Merge branch 'guilt/xfs-5.19-fuzz-fixes' into xfs-5.19-for-next 2022-05-04 12:38:02 +10:00
Dave Chinner
166afc45ed xfs: fix reflink inefficiencies
As Dave Chinner has complained about on IRC, there are a couple of
 things about reflink that are very inefficient.  First of all, we
 limited the size of all bunmapi operations to avoid flooding the log
 with defer ops in the worst case, but recent changes to the defer ops
 code have solved that problem, so get rid of the bunmapi length clamp.
 
 Second, the log reservations for reflink operations are far far larger
 than they need to be.  Shrink them to exactly what we need to handle
 each deferred RUI and CUI log item, and no more.  Also reduce logcount
 because we don't need 8 rolls per operation.  Introduce a transaction
 reservation compatibility layer to avoid changing the minimum log size
 calculations.
 
 v2: better document the use of EFIs to track when refcount updates
     should be continued in a new transaction, disentangle the alternate
     log space reservation code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmJq5V4ACgkQ+H93GTRK
 tOtfRw//XDKmVXMIi8V9YINW5mY2B1C4q4RGCrzSvzqVDMH6ADbpKSV672kqdtly
 S2zbfoi/nJgWiZWup4Vs3WiAqqZJezrGQyIqb16nXyH/VT6FINmG1VwNBn5NAmsL
 eNKHQUEn69q5SnCNddu2niT9HJ+NQec3gV/qLIE9eo7K2PiQs6VX7zgRahgt3bfl
 1iQQIPaRnX3qTTgfKye7pYMxsaDopkYcEBQfZkfTe/jUGTPmdpjabYO51e2+jbTg
 7df3kGYHn5sdQp4KA5jtH4icICGpps2jtUqUFY3kVaSknrlY3eDSPNg0MHsnQMP7
 bxV0yKcAmvaSeZwrvMV8IxqlEmU8X2AQar6R3XkdKidHYmxqubAx8+IxowPNQeu5
 HeKmWIqtYWvuQsKjVcdGg32wsV55yJq9C42PxhMcov8HaJPQc8gBPFTULn7WH0gJ
 swTGOIba8RV459ZZzMznCayxjbnUO2jsNj6ewC5v+S2WXyerVA/APTCaMC4UBDfT
 BDN4IiSXCwn0UkseujERNZi4M4TKZ9fTMtVloadlnfQJWCy+GPqKmsHehRDQMOeW
 6737sF4vW1lj9VfTFY5oq/bpG6lKYMkL0tzLSoWjyh9VsinhHqt7Byl//V2kUlct
 Ndj4t6pBxbJJGhHVA6nFOy0ULG9cHxTDYqm1nb1OXM3LZK6Pd+c=
 =KU+/
 -----END PGP SIGNATURE-----

Merge tag 'reflink-speedups-5.19_2022-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.19-for-next

xfs: fix reflink inefficiencies

As Dave Chinner has complained about on IRC, there are a couple of
things about reflink that are very inefficient.  First of all, we
limited the size of all bunmapi operations to avoid flooding the log
with defer ops in the worst case, but recent changes to the defer
ops code have solved that problem, so get rid of the bunmapi length
clamp.

Second, the log reservations for reflink operations are far far
larger than they need to be.  Shrink them to exactly what we need to
handle each deferred RUI and CUI log item, and no more.  Also reduce
logcount because we don't need 8 rolls per operation.  Introduce a
transaction reservation compatibility layer to avoid changing the
minimum log size calculations.

Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:37:40 +10:00
Dave Chinner
956f1b8f80 xfs: fix rmap inefficiencies
Reduce the performance impact of the reverse mapping btree when reflink
 is enabled by using the much faster non-overlapped btree lookup
 functions when we're searching the rmap index with a fully specified
 key.  If we find the exact record we're looking for, great!  We don't
 have to perform the full overlapped scan.  For filesystems with high
 sharing factors this reduces the xfs_scrub runtime by a good 15%%.
 
 This has been shown to reduce the fstests runtime for realtime rmap
 configurations by 30%%, since the lack of AGs severely limits
 scalability.
 
 v2: simplify the non-overlapped lookup code per dave comments
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmJq5VkACgkQ+H93GTRK
 tOvpZBAAiwV94XsmJnQkffzLTvlVUy68vLevg+6zi3ALbg8QHN8XoXR/SgBYRAZ0
 8KjoLGbLEV35u7f6NVJOXRvFk1/7yzvwutziuSSYx24ajxUFLK/OCHel1/YAQ6wh
 sdcfGdCtK9/iiSPyc6HXSekpnz0bVhRRtfKiHtstZgGlL7qGXDR3+NTbAjkxgNcu
 gHnZpO+16Y3bqsZQK5itoPHNqrdLT4GnSfGexmTXykYdAukooF6ZOE2MUyh/vX1j
 Em/ZJ5agEPLTYhmHWu5n+Phqmb+vLueFtl3jkBf50VYojokm2dt/MU6Z6zctB4p4
 xh6UmkQ7LhGckvKTh3NW88RP3/sBt5YNFPGo9xpx1aPyu5Os+5NcOjKX7XEtp4Xo
 ufyLy9y1muzQrMheIsSfpWAkZmu3/BLSBGH7gFcyHxIVZiuVfzTfOm7WKteoOFoW
 FUr46H+SUXqKA1h4lXIUsaH6T+D6Z32XTh1RoUx6B2rYQsB1kIDT9wTNYUNeS44e
 FZPDe/zZ2FSVpqcndjyOhhQdv+llK6m2c93acgL/MMNvOvH9cIDCkLUX/irs9h33
 r5V4q1PlYm+QMLxd1/h597aYjQLYoJFBDeLIEtnWsoY4nvjLkcAE/Nyr39e5tqes
 YJ+k724XzW5vC7QGJjOH6gAXFO6zRGMOEIgn6+oUKiVS2EVG57w=
 =73GF
 -----END PGP SIGNATURE-----

Merge tag 'rmap-speedups-5.19_2022-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.19-for-next

xfs: fix rmap inefficiencies

Reduce the performance impact of the reverse mapping btree when
reflink is enabled by using the much faster non-overlapped btree
lookup functions when we're searching the rmap index with a fully
specified key.  If we find the exact record we're looking for,
great!  We don't have to perform the full overlapped scan.  For
filesystems with high sharing factors this reduces the xfs_scrub
runtime by a good 15%%.

This has been shown to reduce the fstests runtime for realtime rmap
configurations by 30%%, since the lack of AGs severely limits
scalability.

Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:37:18 +10:00
Dave Chinner
5e116e99dc Merge branch 'guilt/xlog-intent-whiteouts' into xfs-5.19-for-next 2022-05-04 12:37:02 +10:00
Dave Chinner
9cf4f6160c Merge branch 'guilt/xfs-5.19-misc-2' into xfs-5.19-for-next 2022-05-04 12:36:54 +10:00
Dave Chinner
f0f5f65806 xfs: validate v5 feature fields
We don't check that the v4 feature flags taht v5 requires to be set
are actually set anywhere. Do this check when we see that the
filesystem is a v5 filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:17:18 +10:00
Dave Chinner
dd0d2f9755 xfs: set XFS_FEAT_NLINK correctly
While xfs_has_nlink() is not used in kernel, it is used in userspace
(e.g. by xfs_db) so we need to set the XFS_FEAT_NLINK flag correctly
in xfs_sb_version_to_features().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:14:13 +10:00
Dave Chinner
1eb70f54c4 xfs: validate inode fork size against fork format
xfs_repair catches fork size/format mismatches, but the in-kernel
verifier doesn't, leading to null pointer failures when attempting
to perform operations on the fork. This can occur in the
xfs_dir_is_empty() where the in-memory fork format does not match
the size and so the fork data pointer is accessed incorrectly.

Note: this causes new failures in xfs/348 which is testing mode vs
ftype mismatches. We now detect a regular file that has been changed
to a directory or symlink mode as being corrupt because the data
fork is for a symlink or directory should be in local form when
there are only 3 bytes of data in the data fork. Hence the inode
verify for the regular file now fires w/ -EFSCORRUPTED because
the inode fork format does not match the format the corrupted mode
says it should be in.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:13:53 +10:00
Dave Chinner
dc04db2aa7 xfs: detect self referencing btree sibling pointers
To catch the obvious graph cycle problem and hence potential endless
looping.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 12:13:35 +10:00
Dave Chinner
0d227466be xfs: intent item whiteouts
When we log modifications based on intents, we add both intent
and intent done items to the modification being made. These get
written to the log to ensure that the operation is re-run if the
intent done is not found in the log.

However, for operations that complete wholly within a single
checkpoint, the change in the checkpoint is atomic and will never
need replay. In this case, we don't need to actually write the
intent and intent done items to the journal because log recovery
will never need to manually restart this modification.

Log recovery currently handles intent/intent done matching by
inserting the intent into the AIL, then removing it when a matching
intent done item is found. Hence for all the intent-based operations
that complete within a checkpoint, we spend all that time parsing
the intent/intent done items just to cancel them and do nothing with
them.

Hence it follows that the only time we actually need intents in the
log is when the modification crosses checkpoint boundaries in the
log and so may only be partially complete in the journal. Hence if
we commit and intent done item to the CIL and the intent item is in
the same checkpoint, we don't actually have to write them to the
journal because log recovery will always cancel the intents.

We've never really worried about the overhead of logging intents
unnecessarily like this because the intents we log are generally
very much smaller than the change being made. e.g. freeing an extent
involves modifying at lease two freespace btree blocks and the AGF,
so the EFI/EFD overhead is only a small increase in space and
processing time compared to the overall cost of freeing an extent.

However, delayed attributes change this cost equation dramatically,
especially for inline attributes. In the case of adding an inline
attribute, we only log the inode core and attribute fork at present.
With delayed attributes, we now log the attr intent which includes
the name and value, the inode core adn attr fork, and finally the
attr intent done item. We increase the number of items we log from 1
to 3, and the number of log vectors (regions) goes up from 3 to 7.
Hence we tripple the number of objects that the CIL has to process,
and more than double the number of log vectors that need to be
written to the journal.

At scale, this means delayed attributes cause a non-pipelined CIL to
become CPU bound processing all the extra items, resulting in a > 40%
performance degradation on 16-way file+xattr create worklaods.
Pipelining the CIL (as per 5.15) reduces the performance degradation
to 20%, but now the limitation is the rate at which the log items
can be written to the iclogs and iclogs be dispatched for IO and
completed.

Even log IO completion is slowed down by these intents, because it
now has to process 3x the number of items in the checkpoint.
Processing completed intents is especially inefficient here, because
we first insert the intent into the AIL, then remove it from the AIL
when the intent done is processed. IOWs, we are also doing expensive
operations in log IO completion we could completely avoid if we
didn't log completed intent/intent done pairs.

Enter log item whiteouts.

When an intent done is committed, we can check to see if the
associated intent is in the same checkpoint as we are currently
committing the intent done to. If so, we can mark the intent log
item with a whiteout and immediately free the intent done item
rather than committing it to the CIL. We can basically skip the
entire formatting and CIL insertion steps for the intent done item.

However, we cannot remove the intent item from the CIL at this point
because the unlocked per-cpu CIL item lists do not permit removal
without holding the CIL context lock exclusively. Transaction commit
only holds the context lock shared, hence the best we can do is mark
the intent item with a whiteout so that the CIL push can release it
rather than writing it to the log.

This means we never write the intent to the log if the intent done
has also been committed to the same checkpoint, but we'll always
write the intent if the intent done has not been committed or has
been committed to a different checkpoint. This will result in
correct log recovery behaviour in all cases, without the overhead of
logging unnecessary intents.

This intent whiteout concept is generic - we can apply it to all
intent/intent done pairs that have a direct 1:1 relationship. The
way deferred ops iterate and relog intents mean that all intents
currently have a 1:1 relationship with their done intent, and hence
we can apply this cancellation to all existing intent/intent done
implementations.

For delayed attributes with a 16-way 64kB xattr create workload,
whiteouts reduce the amount of journalled metadata from ~2.5GB/s
down to ~600MB/s and improve the creation rate from 9000/s to
14000/s.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:50:29 +10:00
Dave Chinner
3512fc1e84 xfs: whiteouts release intents that are not in the AIL
When we release an intent that a whiteout applies to, it will not
have been committed to the journal and so won't be in the AIL. Hence
when we drop the last reference to the intent, we do not want to try
to remove it from the AIL as that will trigger a filesystem
shutdown. Hence make the removal of intents from the AIL conditional
on them actually being in the AIL so we do the correct thing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:47 +10:00
Dave Chinner
c23ab603e3 xfs: add log item method to return related intents
To apply a whiteout to an intent item when an intent done item is
committed, we need to be able to retrieve the intent item from the
the intent done item. Add a log item op method for doing this, and
wire all the intent done items up to it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:39 +10:00
Dave Chinner
22b1afc57e xfs: factor and move some code in xfs_log_cil.c
In preparation for adding support for intent item whiteouts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:30 +10:00
Dave Chinner
bb7b1c9c5d xfs: tag transactions that contain intent done items
Intent whiteouts will require extra work to be done during
transaction commit if the transaction contains an intent done item.

To determine if a transaction contains an intent done item, we want
to avoid having to walk all the items in the transaction to check if
they are intent done items. Hence when we add an intent done item to
a transaction, tag the transaction to indicate that it contains such
an item.

We don't tag the transaction when the defer ops is relogging an
intent to move it forward in the log. Whiteouts will never apply to
these cases, so we don't need to bother looking for them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:21 +10:00
Dave Chinner
f5b81200b6 xfs: add log item flags to indicate intents
We currently have a couple of helper functions that try to infer
whether the log item is an intent or intent done item from the
combinations of operations it supports.  This is incredibly fragile
and not very efficient as it requires checking specific combinations
of ops.

We need to be able to identify intent and intent done items quickly
and easily in upcoming patches, so simply add intent and intent done
type flags to the log item ops flags. These are static flags to
begin with, so intent items should have been typed like this from
the start.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:09 +10:00
Dave Chinner
5ddd658ea8 xfs: don't commit the first deferred transaction without intents
If the first operation in a string of defer ops has no intents,
then there is no reason to commit it before running the first call
to xfs_defer_finish_one(). This allows the defer ops to be used
effectively for non-intent based operations without requiring an
unnecessary extra transaction commit when first called.

This fixes a regression in per-attribute modification transaction
count when delayed attributes are not being used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:46:00 +10:00
Dave Chinner
b2c28035ce xfs: hide log iovec alignment constraints
Callers currently have to round out the size of buffers to match the
aligment constraints of log iovecs and xlog_write(). They should not
need to know this detail, so introduce a new function to calculate
the iovec length (for use in ->iop_size implementations). Also
modify xlog_finish_iovec() to round up the length to the correct
alignment so the callers don't need to do this, either.

Convert the only user - inode forks - of this alignment rounding to
use the new interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:45:50 +10:00
Dave Chinner
c230a4a85b xfs: fix potential log item leak
Ever since we added shadown format buffers to the log items, log
items need to handle the item being released with shadow buffers
attached. Due to the fact this requirement was added at the same
time we added new rmap/reflink intents, we missed the cleanup of
those items.

In theory, this means shadow buffers can be leaked in a very small
window when a shutdown is initiated. Testing with KASAN shows this
leak does not happen in practice - we haven't identified a single
leak in several years of shutdown testing since ~v4.8 kernels.

However, the intent whiteout cleanup mechanism results in every
cancelled intent in exactly the same state as this tiny race window
creates and so if intents down clean up shadow buffers on final
release we will leak the shadow buffer for just about every intent
we create.

Hence we start with this patch to close this condition off and
ensure that when whiteouts start to be used we don't leak lots of
memory.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:45:11 +10:00
Dave Chinner
cb512c9216 xfs: zero inode fork buffer at allocation
When we first allocate or resize an inline inode fork, we round up
the allocation to 4 byte alingment to make journal alignment
constraints. We don't clear the unused bytes, so we can copy up to
three uninitialised bytes into the journal. Zero those bytes so we
only ever copy zeros into the journal.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-04 11:44:55 +10:00
Jakub Kicinski
0a806ecc40 Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes

This patch series includes 3 fixes:
 - Fix an occasional VF open failure.
 - Fix a PTP spinlock usage before initialization
 - Fix unnecesary RX packet drops under high TX traffic load.
====================

Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03 17:41:35 -07:00
Michael Chan
195af57914 bnxt_en: Fix unnecessary dropping of RX packets
In bnxt_poll_p5(), we first check cpr->has_more_work.  If it is true,
we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
continue polling.  It is possible to exhanust the budget again when
__bnxt_poll_cqs() returns.

We then enter the main while loop to check for new entries in the NQ.
If we had previously exhausted the NAPI budget, we may call
__bnxt_poll_work() to process an RX entry with zero budget.  This will
cause packets to be dropped unnecessarily, thinking that we are in the
netpoll path.  Fix it by breaking out of the while loop if we need
to process an RX NQ entry with no budget left.  We will then exit
NAPI and stay in polling mode.

Fixes: 389a877a3b ("bnxt_en: Process the NQ under NAPI continuous polling.")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03 17:41:32 -07:00
Michael Chan
2b156fb57d bnxt_en: Initiallize bp->ptp_lock first before using it
bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
spinlock.  The spinlock is not initialized until later.  Move the
bnxt_ptp_init_rtc() call after the spinlock is initialized.

Fixes: 24ac1ecd52 ("bnxt_en: Add driver support to use Real Time Counter for PTP")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-03 17:41:32 -07:00