Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or
TCP transport. Otherwise the call to lock the mutex will scribble over
whatever structure is actually there. This has been seen to cause hard
system lockups when the underlying transport was RDMA.
Fixes: b49ea673e1 ("SUNRPC: lock against ->sock changing during sysfs read")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.
Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When in a low memory situation, we do want rpciod to kick off direct
reclaim in the case where that helps, however we don't want it looping
forever in mempool_alloc().
So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY,
and then fall back to a GFP_NOWAIT allocation from the mempool.
Ditto for rpc_alloc_task()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in
the socket layer, so replace it with our own flag in the transport
sock_state field.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Fixes: 3be232f11a ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to
toggle whether or not the client will engage in actively discovery
of trunking locations.
v2 make notrunkdiscovery default
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 1976b2b314 ("NFSv4.1 query for fs_location attr on a new file system")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I believe,
eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and
buffered writes are forbidden anyway. There is no other need for
i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and
performs checks that are not needed for swap. So bypass it
for swap_rw().
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we are swapping over NFSv4, we may not be able to allocate memory to
start the state-manager thread at the time when we need it.
So keep it always running when swap is enabled, and just signal it to
start.
This requires updating and testing the cl_swapper count on the root
rpc_clnt after following all ->cl_parent links.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFS_RPC_SWAPFLAGS is only used for READ requests.
It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
requests. This is not needed for swap READ - though it is for writes
where it is set via a different mechanism.
RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
This is not needed as the root credential is saved when the swap file is
opened, and this is used for all IO.
So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
RPC_TASK_ROOTCREDS, that isn't needed either.
Remove both.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything. So it
must not block waiting for memory.
mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running. If all available workqueue
threads are waiting on the mempool, no thread is available to return
anything.
lookup_cred() can block on a mempool or kmalloc - and this can cause
deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't
block on memory. If the -ENOMEM gets back to call_refreshresult(), wait
a short while and try again. HZ>>4 is chosen as it is used elsewhere
for -ENOMEM retries.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Avoid clearing the entire readdir page cache if we're just doing forced
readdirplus for the 'ls -l' heuristic.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Instead of using a linear index to address the pages, use the cookie of
the first entry, since that is what we use to match the page anyway.
This allows us to avoid re-reading the entire cache on a seekdir() type
of operation. The latter is very common when re-exporting NFS, and is a
major performance drain.
The change does affect our duplicate cookie detection, since we can no
longer rely on the page index as a linear offset for detecting whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can use
the page index to avoid clearing pages that have been read. A subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.
The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.
To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.
This patch therefore tries to tone down the amount of readahead we
perform, and adjust it to try to match the amount of data being
requested by user space.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the page cache entry that was last read gets invalidated for some
reason, then make sure we can re-create it on the next call to readdir.
This, combined with the cache page validation, allows us to reuse the
cached value of page-index on successive calls to nfs_readdir.
Credit is due to Benjamin Coddington for showing that the concept works,
and that it allows for improved cache sharing between processes even in
the case where pages are lost due to LRU or active invalidation.
Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Now that we have more fine grained attribute revalidation, let's just
get rid of NFS_INO_REVAL_PAGECACHE.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In 4.1+, the server is allowed to set a flag
NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells
the client that it does not need to do a silly rename of an
opened file when it's being removed.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFS is one of the last two users of the deprecated ->readpages aop.
This conversion looks straightforward, but I have only compile-tested
it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to
not be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIX2ssACgkQMUZtbf5S
IrvImQ//b+JILp0M/jz6q25n5U7qxuNmJypq659kR19jnwGH520XTwnFE9/FB3gw
UnlCb28+jdMX1HHQJaUKkKYTilfFvyMoRPAMbLFO51Y02dVALTjD7C2wJ1AyEiTV
eKhOcGHLbDzLom3+FnK566adOlGsIZfr4bR4zlGcthU0wTvU6S2K3WTkVJMASJzJ
JizNgN+SvpdpmnYj+wsg2cj/5W4R/IPdxCrkZMkEMomJnVxA61RV+wsCcsT+Cjrf
wu+cknUiVIGQNtCT4hz8VZ3tOoAeX+Xg/4YbaxVxnvunTQh+D+eIza40IEqewlEq
KFOXGuPXsse6ZJ7IqVZt1hgBxJ8bpItxEBNSgU3KqJKMTTKOpWWjZxkTYeIERMry
Ywb/ciZ7pwbo2CNhICh6+xefQvGbU0jgsiMgSkQvXZ9b9IsdPM4bwgvjFsyqnEMz
0HVpqN02F7MM44mD4P0TQct9OSemu6sVqQFrpk8+CvPfaSEctCv/iJ6WR/xxUgSp
uPvKYlv7BqOKZtqzGOk215WEvTUf8dy9cxcQwoYBOBxs8h2XQSRXEWCsGWCOg5+V
xLnlnreXHXKWcUrAmsJlZh6XmWGk9lBDqLX7hKCYZzMgU8nNopSDKKcDpVDkaBzC
DrK8Y3y+lBhpBwCHt/GZw8Qg9aDDsczFpOfPZBVJy+jH+7AGK7M=
=LT/x
-----END PGP SIGNATURE-----
Merge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to not
be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h"
* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: mv643xx_eth: process retval from of_get_mac_address
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
openvswitch: Fix setting ipv6 fields causing hw csum failure
ipv6: prevent a possible race condition with lifetimes
net/smc: Use a mutex for locking "struct smc_pnettable"
bnx2x: fix driver load from initrd
Revert "xen-netback: Check for hotplug-status existence before watching"
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIXp/gQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppUzD/9wm9OHebVIjFPDCb7pznbrc7zvUXTXMLGJ
dFh/Un8NVXXoOMppkMV2rXyJhhKb2kcjaqa5OlT64/3SJkmESHbSpQT1lbDcQfGu
Yu4K2JU5KwSMNBNbSJJo4G4gM6VcgD22k4d6PbeTYZVuXY0rzk/KsG5lZZZFrQzG
mQnobyPtsjtOwUMWAlpABYqVU9sZU4o0YUkHUaAHn+Xsm13CDJDsDPZx7/XNctvp
mdoWI5RsffWxeVt3iB/f1AVBuczgsu4NO5aofF4XCjmnj5r1tGsXWqFFP4aiHP2i
Iiu6twdA1KAgSihEHrsmDSiwwG3B4rLmtu739f5AEi8w3Haj9QBK/tgADRI2XLGq
ulRMIHEE2qCApy9NWigyMn6fxt9ne8njAH8coljFsb1sV+2a+mn47f434Gzqo5Jl
cKZeYElrJGfeNdjDTeG76fVGR89fKEKBvhml8cf/jph8RcPmLSNsTHPxOZcVSLKU
X46FYi5rj9Y9VcPlF8UXmoDZt8+cBVDA12OzxIiKQDMzWeQkb4ShqE3IPYcpZ+f0
dJN6oZTIMc283kHWva5ok9egSnC7Ly6xv3+Mf6jBVgAuefgVAncRGBCLKFTLZL6D
CprhBmby/ubMkWstB2t9o0pFBxC7ttrpGexig/wvEe/OpDz+1d4rXEIEEwRFDLmR
x+XsGKgOKQ==
=7WMB
-----END PGP SIGNATURE-----
Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request:
- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (Christoph
Hellwig)
- Clear iocb->private at poll completion (Stefano)
* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
nvme: don't return an error from nvme_configure_metadata
block: clear iocb->private in blkdev_bio_end_io_async()
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmIV/fQACgkQ4CHKc/GJ
qRBhaAgAoz81fjhlkcCaHdgxVTEx6L93iJQJiZWoE3gTKk2jruun3sIYmPSOiY+b
bWR1datDnvaS/Xv04rZ6pm6XPjCT+LmrOQCOlZMjptc6HKoKuDZTvcQ0u5CYOfQS
I9ZRtPaHjSmhntS8BErxGes5+PF1hz/q2rGuODt4/DQCNPZNdHXMdym9w4Z4xHXm
TuH2VXzv5JXhYlUEDz2HP8LXmbvxA9rGaMgngpX92pCL8uTLqANoZCT+zHEj3cKw
db6/A8S7Y4PsfF0JphNup+wcsWj+yfIrfAQwnTgNXR4hlhbUxOHTJqXlQGK/NW7C
tg2nXxQQn14MwPlkatdxFYzg1TbMYg==
=U6Fu
-----END PGP SIGNATURE-----
Merge tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Build fix (workaround) for clang.
- Fix a /proc/kcore based slabinfo script broken by struct slab changes
in 5.17-rc1.
* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
tools/cgroup/slabinfo: update to work with struct slab
slab: remove __alloc_size attribute from __kmalloc_track_caller
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
is a multiple of dwords and should be no less than 4,096.
Current code sets H2CData PDU data_length to r2t_length,
it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
data_length to min(req->h2cdata_left, queue->maxh2cdata).
Also validate MAXH2CDATA value returned by target in ICResp PDU,
if it is not a multiple of dword or if it is less than 4096 return
-EINVAL from nvme_tcp_init_connection().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit c37495d625 ("slab: add __alloc_size attributes for better
bounds checking") added __alloc_size attributes to a bunch of kmalloc
function prototypes. Unfortunately the change to __kmalloc_track_caller
seems to cause clang to generate broken code and the first time this is
called when booting, the box will crash.
While the compiler problems are being reworked and attempted to be
solved [1], let's just drop the attribute to solve the issue now. Once
it is resolved it can be added back.
[1] https://github.com/ClangBuiltLinux/linux/issues/1599
Fixes: c37495d625 ("slab: add __alloc_size attributes for better bounds checking")
Cc: stable <stable@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220218131358.3032912-1-gregkh@linuxfoundation.org
Where commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.
Commit 13765de814 ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.
Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIPE0oQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpq56D/9yUd82CT1c9drWjbANnIuOxxOAauNwDCdH
wLAz5FptjRC5ju3Ai9FzklfeXlU8SEBR4/TLp+lO8elA/NS9tOAW3+v/0pZ4oZ4G
MwxnE7Zj00AiRZ60w9zcb3Vq5TTXV71kcqbcx8YAh7b8niMDow8boLBY/FkB1EZc
1pwCLjF5I5USmU8zlST5G6ctqcqHmV8TxIU3wijlYmZnkyxMo6UxqCpO5BOZRgGt
S5VDvjoVDIQO757wLnaB5tA2NzEmaBG44yFoGkJqlChP7IgJrjTZQGKsqsIKLZzH
Btot/FpEjRYzFdgumHxJ9Zjv/5QlzFT115H14e85CtfdeIhfifd8G4JKtPGU24IV
5p/czOP8SA8qnEL6ply6osltzA9OkQBt1xZ6JW+0lK/izcHTIxVevrD8KsNHzh2z
lskEn0Sowb1viaGrwuQfWyX7xtJCX2hGOsD2gH95B4FcsPNbfp0u4uGimSYS02bB
2jJIhrhvQZTFyd/qeOGQ4MnJxoLF4Q63l+NzCqVSTBdSTJK/iGiofyqxL0WHB65L
/q8J4fTEvDK6Fc3GETwHP70yoGitJ1p2+5n7uB/Ynfh7RHq/b7Ispjv5AenAyo52
2GtY476aaYMdHVOM8h1Fnc8q1Qu4vwFY3Ly23FAhmRxilAXk5roPB6TP6Q6puCKe
wqdhwjgVAA==
=f22m
-----END PGP SIGNATURE-----
Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Surprise removal fix (Christoph)
- Ensure that pages are zeroed before submitted for userspace IO
(Haimin)
- Fix blk-wbt accounting issue with BFQ (Laibin)
- Use bsize for discard granularity in loop (Ming)
- Fix missing zone handling in blk_complete_request() (Pankaj)
* tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block:
block/wbt: fix negative inflight counter when remove scsi device
block: fix surprise removal for drivers calling blk_set_queue_dying
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
block: loop:use kstatfs.f_bsize of backing file to set discard granularity
block: Add handling for zone append command in blk_complete_request
Alexei Starovoitov says:
====================
pull-request: bpf 2022-02-17
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).
The main changes are:
1) Add schedule points in map batch ops, from Eric.
2) Fix bpf_msg_push_data with len 0, from Felix.
3) Fix crash due to incorrect copy_map_value, from Kumar.
4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
5) Fix a bpf_timer initialization issue with clang, from Yonghong.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add schedule points in batch ops
bpf: Fix crash due to out of bounds access into reg2btf_ids.
selftests: bpf: Check bpf_msg_push_data return value
bpf: Fix a bpf_timer initialization issue
bpf: Emit bpf_timer in vmlinux BTF
selftests/bpf: Add test for bpf_timer overwriting crash
bpf: Fix crash due to incorrect copy_map_value
bpf: Do not try bpf_msg_push_data with len 0
====================
Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netfilter.
Current release - regressions:
- dsa: lantiq_gswip: fix use after free in gswip_remove()
- smc: avoid overwriting the copies of clcsock callback functions
Current release - new code bugs:
- iwlwifi:
- fix use-after-free when no FW is present
- mei: fix the pskb_may_pull check in ipv4
- mei: retry mapping the shared area
- mvm: don't feed the hardware RFKILL into iwlmei
Previous releases - regressions:
- ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
- tipc: fix wrong publisher node address in link publications
- iwlwifi: mvm: don't send SAR GEO command for 3160 devices,
avoid FW assertion
- bgmac: make idm and nicpm resource optional again
- atl1c: fix tx timeout after link flap
Previous releases - always broken:
- vsock: remove vsock from connected table when connect is
interrupted by a signal
- ping: change destination interface checks to match raw sockets
- crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
semantics (and null-deref) after SO_RESERVE_MEM was added
- ipv6: make exclusive flowlabel checks per-netns
- bonding: force carrier update when releasing slave
- sched: limit TC_ACT_REPEAT loops
- bridge: multicast: notify switchdev driver whenever MC processing
gets disabled because of max entries reached
- wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
- iwlwifi: fix locking when "HW not ready"
- phy: mediatek: remove PHY mode check on MT7531
- dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
- dsa: lan9303:
- fix polarity of reset during probe
- fix accelerated VLAN handling
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIOm44ACgkQMUZtbf5S
IruWWBAAmNJBxVoVUahidwIVKHnKYeHClzDee3B6sYSupRW22Eeuh7Q8fqPMO4J7
KO9nP/vGibFuhKfjApvS6wPvpYWCuXAoSozfOa+JWNrFg9uVpdyHIhfdXl0WPZqx
A4+p2vIs1ldV0yOac/7ZGMWg57dzlYUurkld3xFwf7KyOOhV5/PkxkxpGN9eFkv1
NTFWTaIsVzFMXMjNXkGfGHmt/8mSmZHgsH+tYd+KXsjbs2UpbGM3SyfHBlUf3aA0
bceT4h07xA6C4rlUCbmalRqwvtcdM15MwlDBtSBXm5fXy0c59XxQOqj/dLhPuAO4
42sQlO2MhqDrZjR0tOjmuP2cpc7llj1lIZe1Qs3nKiNFHcuOJEHGw1PYCO85jKdn
xiWquuoe3G5YkQeoOoi+HqmXcP6aBZpUbROvYNjSJhcci4Ck0Qjna5J1rk8IRjb8
AkDf68dodn8I+W5dx/EnopH/ShPQcqGw1+tH4215UB7b40Ecpc+laqFAHRcgs654
ONuJVdRC4k3TyES1B9z8vawLcGYWa06fz8Mh/dS3gnLDphe5ZiH2tTrESfBdYixH
idmuO5C/YDhsVelVuO+B0RT/yziPb3Lr+BTplSfkODCXT6LuOdCYUcHx5nGZ1TYW
EeZ9hMSaxp2E06llEyD6JQQ+0Q17wnDGjLxtOMk+A8fmNX2F17g=
=Nrzq
-----END PGP SIGNATURE-----
Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and netfilter.
Current release - regressions:
- dsa: lantiq_gswip: fix use after free in gswip_remove()
- smc: avoid overwriting the copies of clcsock callback functions
Current release - new code bugs:
- iwlwifi:
- fix use-after-free when no FW is present
- mei: fix the pskb_may_pull check in ipv4
- mei: retry mapping the shared area
- mvm: don't feed the hardware RFKILL into iwlmei
Previous releases - regressions:
- ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
- tipc: fix wrong publisher node address in link publications
- iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
assertion
- bgmac: make idm and nicpm resource optional again
- atl1c: fix tx timeout after link flap
Previous releases - always broken:
- vsock: remove vsock from connected table when connect is
interrupted by a signal
- ping: change destination interface checks to match raw sockets
- crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
semantics (and null-deref) after SO_RESERVE_MEM was added
- ipv6: make exclusive flowlabel checks per-netns
- bonding: force carrier update when releasing slave
- sched: limit TC_ACT_REPEAT loops
- bridge: multicast: notify switchdev driver whenever MC processing
gets disabled because of max entries reached
- wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
- iwlwifi: fix locking when "HW not ready"
- phy: mediatek: remove PHY mode check on MT7531
- dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
- dsa: lan9303:
- fix polarity of reset during probe
- fix accelerated VLAN handling"
* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
bonding: force carrier update when releasing slave
nfp: flower: netdev offload check for ip6gretap
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
ipv4: fix data races in fib_alias_hw_flags_set
net: dsa: lan9303: add VLAN IDs to master device
net: dsa: lan9303: handle hwaccel VLAN tags
vsock: remove vsock from connected table when connect is interrupted by a signal
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
ping: fix the dif and sdif check in ping_lookup
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
net: sched: limit TC_ACT_REPEAT loops
tipc: fix wrong notification node addresses
net: dsa: lantiq_gswip: fix use after free in gswip_remove()
ipv6: per-netns exclusive flowlabel checks
net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
CDC-NCM: avoid overflow in sanity checking
mctp: fix use after free
net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
bonding: fix data-races around agg_select_timer
dpaa2-eth: Initialize mutex used in one step timestamping path
...
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb8 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.
Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.
Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmILjZgTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXuPAB/96ls6ZaUrvAEseODu9N+qbCKCjqjGx
LBqP5gzzG71LoFLlvvGp7quwDXkSlfzMmZM1oxBrsRN4TiexdQ4b9tR2jNqxUBS0
pR7efRpR4YN1TrvxCr0lgfbKV6F3EC4YqMwK2Tf7zMRkzAbMQmw500JTkAdtplUT
opSZKlWt1Q66Iudz7vwz2e2D/NnAg03icB7lZUNulV1WTDfDfBUZyAphAnOqpty9
Y9UlVjFYSozOA6dv3HClYiJPjxhzUev88W/Sld81+2+gGCngWRT4a2OyUvCqNWXD
P8EswLHuqECpkz6uxnbQOLDBgdiMXigtANNky/E4Za8r9MqpZT3+ueLf
=/jBG
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Rework use of DMA_BIT_MASK in vmbus to work around a clang bug
(Michael Kelley)
- Fix NUMA topology (Long Li)
- Fix a memory leak in vmbus (Miaoqian Lin)
- One minor clean-up patch (Cai Huoqing)
* tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: utils: Make use of the helper macro LIST_HEAD()
Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmII/BUACgkQEsHwGGHe
VUr5VBAAhK4/Hqe5wgbbKE2CwUfvCd7KBbKQb0FbBguO9Sg9NkQhD0ROE5qQCsRE
AMBJKuC4AahoVIsPuJ2nr73iaM5UY3xx6Dth6qPnA7VEwpWM/h/A2t0D9bgm5PdR
1FK5QnmIP+o/astYTvP8YpIlhRqHK97fZM0KYZX8SfLG2sbnYiBANj7YiwafNLQs
7FR/J+LH2PRAU1sBrdrhvd/u4jvSTjcbutPEETGuTTMoPiJj4TGa0XyZNQR4/br+
WTnbzLFJpRabBMVE2ELzbzSXnEeUZpSKe79G9LW5AFaGa9UpyooXVUn4PcPNXTia
Q8zkMKusNFUlQc0pbcUxaS/g+UnC7koFn/XvPxp5k9tL7x4exq9hhOn/F+K9Ctuw
jUVrg63+/VFYtMwbZpYd81k5I/rx+o7t8rkmUrk0Wz/gpE9CDgbjfhGyAFqmXAFU
mGAGcFHbBaG6J5/XGqOGZR42yajlg9lwxpaj+taCtfbf46/48E6mTtLG2qQRDAqW
QlGVS8H9t+vmwfO8oAt2tWwLTyZqt+6VmNTbwerKSEblEC+yB/lLO5AcWsnNwHVl
9ZwSRTPw8ejkj/AdyoTSedMJHzcsAXT8PtIr2rSjuX1b8SubN8GmbsApyQmzV3G0
n5DLTcKvpCPjtWjtpF44yjP1rfdDLFBIh+RYHF7iNPFo0uhQFOg=
=+cTD
-----END PGP SIGNATURE-----
Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
"Fix a case where objtool would mistakenly warn about instructions
being unreachable"
* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
Merge misc fixes from Andrew Morton:
"5 patches.
Subsystems affected by this patch series: binfmt, procfs, and mm
(vmscan, memcg, and kfence)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kfence: make test case compatible with run time set sample interval
mm: memcg: synchronize objcg lists with a dedicated spinlock
mm: vmscan: remove deadlock due to throttling failing to make progress
fs/proc: task_mmu.c: don't read mapcount for migration entry
fs/binfmt_elf: fix PT_LOAD p_align values for loaders
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization. However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired. Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.
Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian Knig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock:
00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock:
00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Revert a recent change that attempted to avoid issues with
conflicting address ranges during PCI initialization, because it
turned out to introduce a regression (Hans de Goede).
- Revert a change that limited EC GPE wakeups from suspend-to-idle
to systems based on Intel hardware, because it turned out that
systems based on hardware from other vendors depended on that
functionality too (Mario Limonciello).
- Fix two issues related to the handling of wakeup interrupts and
wakeup events signaled through the EC GPE during suspend-to-idle
on x86 (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmIGkq4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxs3MQAKhmnM5mHOqFIM7VM9SoutEoOIXPWLCz
CNjjqfwA63GvxTp//Dks4KGCLF90mfFOv8Mt7vlzVHyQDvTv2w0YddmBxQAgXTU4
v1yfHcRssND1w2kgIg7wThWSxSKvP/4XgqwrhTmEKHb1qPFrpq9pt7+4RIco9362
pacxW1QPX2+5yNDs67JuTsjh2mksKC8CUgiA8BUa+57jrIXvQYWqqZ490PpkbXW0
Nh1naNwT1xDqte5U98PrzZZRt2qUMuKrG2ro0lE97rb067zSoIQo2XCODlo7T12O
7vFzypOTvNJZjd57U9SoKyEDCzDnHkNh2O2jiNKaqzMJHqgh0bkMrMHysPbdTwMu
VH0fw+VElRONyQ5knUcvTG3IRfEuTBl2iqDoO+hLb+cmkL48KL9yJXH7dTRMrNJ8
zqHSCqpON8rZgLfDzrxoVvMXv9Al5ra5wM41EljiUWUDFEB1HbtleysHIMFMxEKy
gBvO6zAyEoh5ie2pxYdSAyY+pq+d1JFeAgpbXpF3miQFE2gchC1BjNjUCguekfty
t0kwOefLOphgShpIm/04F2WBsYNM+QSIoJWRvE5LsEVTdD/j59MweCTJN2KNSIpj
vapl8ymAr6umMrPhxpjY6K4DD000+LYpV9WJ8Q1Wv2HWV9OkevL5KnEXoQ94czpb
ybb/Nd4SwHV+
=aGYp
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert two commits that turned out to be problematic and fix two
issues related to wakeup from suspend-to-idle on x86.
Specifics:
- Revert a recent change that attempted to avoid issues with
conflicting address ranges during PCI initialization, because it
turned out to introduce a regression (Hans de Goede).
- Revert a change that limited EC GPE wakeups from suspend-to-idle to
systems based on Intel hardware, because it turned out that systems
based on hardware from other vendors depended on that functionality
too (Mario Limonciello).
- Fix two issues related to the handling of wakeup interrupts and
wakeup events signaled through the EC GPE during suspend-to-idle on
x86 (Rafael Wysocki)"
* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
PM: s2idle: ACPI: Fix wakeup interrupts handling
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
Ensure that NFS clients cannot send file size or offset values that
can cause the NFS server to crash or to return incorrect or
surprising results. In particular, fix how the NFS server handles
values larger than OFFSET_MAX.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmIDzr4ACgkQM2qzM29m
f5eJRBAAikdh0PYOlZbvy9M1eY6wq3k+Y10JsnCZk4T8Uq0NJF/7CJ3R4/4+xGOh
ZA/2vE1dN4IfqnIOdxw1cXbzzgAO5p/nDLMo9wC6NimrVLkE+S8j38oWvEHOCJXC
TzUbIKkxqBBcfDw4pO4BT42iHx+cqVUuRFd2qkob1ZRoe+BKI+F4+7QNVc8iEw5z
j85i2/h6JohsItzekRbMO1q1iXxBc+IZRYafjibtVRWxRuNUWP8C1cv0eXrlSy3O
L07kZRwzrd52PAi1Q8K07Ip+yTHUMZptyHoB6S863uuz/mOzlpXewvXHMGA1btlr
POHYG/lBXpDS0e2pjksyXXp2I7HJV/HuaMyyLveWRO0qleBc3G5PsvIJNBW7xl5f
NPGdgfaa+8ZeOCGolvPruykL9Eh7QAyWTdPKz1J+NuhjkAB4p6ba9QcKVwP7kYTi
I8zdeUPgbjuFW35hal0ZIlNi2RfcuSGk1FKjotrQ6J3XNIaqPkUWK+1Zz3MzqPUW
+1ElzoXQugJASPBkEZuf1aXr8/vRjKT16l8EX1kbtJ5wjj2OPbnWWZk03ZncLVfv
CzbJTZLqiM0JuRqXvYpUGAQdryWcwvTCAuWxcqrt4ALNWW6Z4Y35Vl8H4sTh8wkr
Q3m6bAVYJx3FmFop7y5ubVH137k1SFJ0NzGJJK0mYoZQSMZoPZI=
=64n/
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd fixes from Chuck Lever:
"Ensure that NFS clients cannot send file size or offset values that
can cause the NFS server to crash or to return incorrect or surprising
results.
In particular, fix how the NFS server handles values larger than
OFFSET_MAX"
* tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Deprecate NFS_OFFSET_MAX
NFSD: Fix offset type in I/O trace points
NFSD: COMMIT operations must not return NFS?ERR_INVAL
NFSD: Clamp WRITE offsets
NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
NFSD: Fix ia_size underflow
NFSD: Fix the behavior of READ near OFFSET_MAX
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
After commit e3728b50cd ("ACPI: PM: s2idle: Avoid possible race
related to the EC GPE") wakeup interrupts occurring immediately after
the one discarded by acpi_s2idle_wake() may be missed. Moreover, if
the SCI triggers again immediately after the rearming in
acpi_s2idle_wake(), that wakeup may be missed too.
The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
when pm_wakeup_irq is 0, but that's not the case any more after the
interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
cleared by the pm_wakeup_clear() call in s2idle_loop(). However,
there may be wakeup interrupts occurring in that time frame and if
that happens, they will be missed.
To address that issue first move the clearing of pm_wakeup_irq to
the point at which it is known that the interrupt causing
acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
for wakeup. Moreover, because that only reduces the size of the
time window in which the issue may manifest itself, allow
pm_system_irq_wakeup() to register two second wakeup interrupts in
a row and, when discarding the first one, replace it with the second
one. [Of course, this assumes that only one wakeup interrupt can be
discarded in one go, but currently that is the case and I am not
aware of any plans to change that.]
Fixes: e3728b50cd ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Using DMA_BIT_MASK(64) as an initializer for a global variable
causes problems with Clang 12.0.1. The compiler doesn't understand
that value 64 is excluded from the shift at compile time, resulting
in a build error.
While this is a compiler problem, avoid the issue by setting up
the dma_mask memory as part of struct hv_device, and initialize
it using dma_set_mask().
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Vitaly Chikunov <vt@altlinux.org>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 743b237c3a ("scsi: storvsc: Add Isolation VM support for storvsc driver")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The concurrent positioning ranges log page 47h is a general purpose log
page and not a subpage of the indentify device log. Using
ata_identify_page_supported() to test for concurrent positioning ranges
support is thus wrong. ata_log_supported() must be used.
Furthermore, unlike other advanced ATA features (e.g. NCQ priority),
accesses to the concurrent positioning ranges log page are not gated by
a feature bit from the device IDENTIFY data. Since many older drives
react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
to read device log pages, avoid problems with older drives by limiting
the concurrent positioning ranges support detection to drives
implementing at least the ACS-4 ATA standard (major version 11). This
additional condition effectively turns ata_dev_config_cpr() into a nop
for older drives, avoiding problems in the field.
Fixes: fe22e1c2f7 ("libata: support concurrent positioning ranges log")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Abderraouf Adjal <adjal.arf@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
fix regression introduced as part of moving to the new mount API.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmH7/AUACgkQ8vlZVpUN
gaOsuQf/TFH8QNBSeEkT5ybnrS51KGTv88mdUVMcsmSMhmAFxiGJLFtMLFu9LG7b
bJYCg+Q9Rieb1qqqtGNyLe4p3ewShSzBFu8p7hzKMfu0EEcrJwTYVywSX0oYhMMm
9o+V6CPcGYVZtImihdsmDvgMRRkzoevHQFx+OLhkaq4Qd9ZEdohchYIhRFNXwd+w
CJiL0TFAnrb4QfWgtq3HyY7aoQumf8YI15C+RTfykzCBhZRFRKXjVXPdIjfGe4O2
Fpjr4gSsgYK0Er0LLJvESeFFVpFz+NV7q9W/Vj5ahaKJDpiVGzL/OPZsnafzHPPy
CSa+iP3ZLcTb+KRTOZ1mgjvS34Cmyw==
=DpdZ
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Various bug fixes for ext4 fast commit and inline data handling.
Also fix regression introduced as part of moving to the new mount API"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs/ext4: fix comments mentioning i_mutex
ext4: fix incorrect type issue during replay_del_range
jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}()
ext4: fix potential NULL pointer dereference in ext4_fill_super()
jbd2: refactor wait logic for transaction updates into a common function
jbd2: cleanup unused functions declarations from jbd2.h
ext4: fix error handling in ext4_fc_record_modified_inode()
ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
ext4: fix error handling in ext4_restore_inline_data()
ext4: fast commit may miss file actions
ext4: fast commit may not fallback for ineligible commit
ext4: modify the logic of ext4_mb_new_blocks_simple
ext4: prevent used blocks from being allocated during fast commit replay
* A couple of fixes when handling an exception while a SError has been delivered
* Workaround for Cortex-A510's single-step erratum
RISCV:
* Make CY, TM, and IR counters accessible in VU mode
* Fix SBI implementation version
x86:
* Report deprecation of x87 features in supported CPUID
* Preparation for fixing an interrupt delivery race on AMD hardware
* Sparse fix
All except POWER and s390:
* Rework guest entry code to correctly mark noinstr areas and fix vtime'
accounting (for x86, this was already mostly correct but not entirely;
for ARM, MIPS and RISC-V it wasn't)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmH+E4AUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNujwf+ON/8pBWyMPdjiY5l5SyNLpRup8Su
zkQoMEDICI7khYUz2bEAjOazFWHmHPsdogAlG82QeJCbFmCqyMb6iX0uWj53BdGP
P2bOM/tXbulvKBBeiTritkUUNO+hBmmSF+57AOJSW+Enhc7HFwk54cuft6f30r+d
JRaEOhPOP34hQ+wFQQhZZh72BaZBqgnrYwZDp1TiC0Wh8v7P4Nf+NFtEgba2nsxC
xfz5PrEhvegtU8Ee9JAF2bAl7851WJq557P2cOpghtUMgh4t6GzCcUOCKIie67oQ
0Vaf+OieAopdT+QNazSEWO9zxl7eTpWjk2hrwsDrgKHAL/YmuWJuSyEYIA==
=C0LZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- A couple of fixes when handling an exception while a SError has
been delivered
- Workaround for Cortex-A510's single-step erratum
RISC-V:
- Make CY, TM, and IR counters accessible in VU mode
- Fix SBI implementation version
x86:
- Report deprecation of x87 features in supported CPUID
- Preparation for fixing an interrupt delivery race on AMD hardware
- Sparse fix
All except POWER and s390:
- Rework guest entry code to correctly mark noinstr areas and fix
vtime' accounting (for x86, this was already mostly correct but not
entirely; for ARM, MIPS and RISC-V it wasn't)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
KVM: x86: Report deprecated x87 features in supported CPUID
KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
KVM: arm64: Avoid consuming a stale esr value when SError occur
RISC-V: KVM: Fix SBI implementation version
RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
kvm/riscv: rework guest entry logic
kvm/arm64: rework guest entry logic
kvm/x86: rework guest entry logic
kvm/mips: rework guest entry logic
kvm: add guest_state_{enter,exit}_irqoff()
KVM: x86: Move delivery of non-APICv interrupt into vendor code
kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHxg5QACgkQ+H93GTRK
tOv/5g/9H1grcwR5RuUm0Xae5R9BMeu3GTOwgN7GIeo7YkQHm5F0O2o0jTXOZAuP
1LZ00CrBHPgHf/sRi0MkGJnRMWQEtrlsexVfDQn05yAB159n68qAwrHSUH5MHHtV
vz2S+X6hBdbPvie22HKKwr+rqEz3oe0vdJ94jRDHAamKL3PM1WbK3R48hJFN2Wua
EGWxid0d6HGSpUVObSRnJyKh4MhKniPWMJTDIAthmEQIUPBJa4aIju38jq7skiep
TLYz9GnnrUyDYlmNSgmvG/a/BcAz2GJQJ1i6FlkuAL8Wjlh+ni30j5P3Bifb8cKf
i1N7YzgaV/5frx1d70ovswblgijF/4iz22CQfX5I1NBYy9rCuVe1N+GltWX/wYG7
ShqP4cjzgrzeEir0ELCN8ubU15SrQ4XDCgj+wHO1f0hD7ldRfKOQObFhdR/k5J+T
MqhjSg5x3N0Y39Gp27T+cDvKM2tYZjb8aqrfU6hk8FUUwgXSovJbx97B6jr4xGHH
/W/7MREuY65Lz+KQ2FOWJJk/AGnzxtyINFUSzDGV5IlR/eQDNkbOJBanRx34iZX9
j3pVEXwWvjRkKm+fhAHxXG0mBNhYwzwuiZgLuL7MQKfal8LShCKnc9PKfAV9PgNI
NcEnmhV6wzC4gJidsFTqsUZqQcURFGUDX0inpaswgr3MEXRUe9s=
=a5K9
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
"A single bugfix for iomap.
The fix should eliminate occasional complaints about stall warnings
when a lot of writeback IO completes all at once and we have to then
go clearing status on a large number of folios.
Summary:
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback"
* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs, iomap: limit individual ioend chain lengths in writeback