Mat Martineau says:
====================
mptcp: Fixes for 5.17
Patch 1 fixes a MPTCP selftest bug that combined the results of two
separate tests in the test output.
Patch 2 fixes a problem where advertised IPv6 addresses were not actually
available for incoming MP_JOIN requests.
====================
Link: https://lore.kernel.org/r/20220210012508.226880-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This change updates mptcp_pm_nl_create_listen_socket() to create
listening sockets bound to IPv6 addresses (where IPv6 is supported).
Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This function also writes the name of the test with its ID, making clear
a new test has been executed.
Without that, the ADD_ADDR results from this test was appended at the
end of the previous test causing confusions. Especially when the second
test was failing, we had:
17 signal invalid addresses syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
add[fail] got 2 ADD_ADDR[s] expected 3
In fact, this 17th test was OK but not the 18th one.
Now we have:
17 signal invalid addresses syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
18 signal addresses race test syn[fail] got 2 JOIN[s] syn expected 3
- synack[fail] got 2 JOIN[s] synack expected
- ack[fail] got 2 JOIN[s] ack expected 3
add[fail] got 2 ADD_ADDR[s] expected 3
Fixes: 33c563ad28 ("selftests: mptcp: add_addr and echo race test")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum,
from Kevin Mitchell.
2) skb->priority support for nfqueue, from Nicolas Dichtel.
3) Remove conntrack extension register API, from Florian Westphal.
4) Move nat destroy hook to nf_nat_hook instead, to remove
nf_ct_ext_destroy(), also from Florian.
5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal.
6) Support for tcp option set to noop for nf_tables, also from Florian.
7) Do not run x_tables comment match from packet path in nf_tables,
from Florian Westphal.
8) Replace spinlock by cmpxchg() loop to update missed ct event,
from Florian Westphal.
9) Wrap cttimeout hooks into single structure, from Florian.
10) Add fast nft_cmp expression for up to 16-bytes.
11) Use cb->ctx to store context in ctnetlink dump, instead of using
cb->args[], from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: ctnetlink: use dump structure instead of raw args
nfqueue: enable to set skb->priority
netfilter: nft_cmp: optimize comparison for 16-bytes
netfilter: cttimeout: use option structure
netfilter: ecache: don't use nf_conn spinlock
netfilter: nft_compat: suppress comment match
netfilter: exthdr: add support for tcp option removal
netfilter: conntrack: pptp: use single option structure
netfilter: conntrack: remove extension register api
netfilter: conntrack: handle ->destroy hook via nat_ops instead
netfilter: conntrack: move extension sizes into core
netfilter: conntrack: make all extensions 8-byte alignned
netfilter: nfqueue: enable to get skb->priority
netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY
====================
Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit
9652dc2eb9 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket:🔒
local_bh_disable() + spin_lock(&ilb->lock):
inet_listen()
inet_csk_listen_start()
sk->sk_prot->hash() := inet_hash()
local_bh_disable()
__inet_hash()
spin_lock(&ilb->lock);
acquire(&ilb->lock);
Reverse order: spin_lock(&ilb2->lock) + local_bh_disable():
tcp_seq_next()
listening_get_next()
spin_lock(&ilb2->lock);
acquire(&ilb2->lock);
tcp4_seq_show()
get_tcp4_sock()
sock_i_ino()
read_lock_bh(&sk->sk_callback_lock);
acquire(softirq_ctrl) // <---- whoops
acquire(&sk->sk_callback_lock)
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-02-09
We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
The main changes are:
1) Add custom BPF allocator for JITs that pack multiple programs into a huge
page to reduce iTLB pressure, from Song Liu.
2) Add __user tagging support in vmlinux BTF and utilize it from BPF
verifier when generating loads, from Yonghong Song.
3) Add per-socket fast path check guarding from cgroup/BPF overhead when
used by only some sockets, from Pavel Begunkov.
4) Continued libbpf deprecation work of APIs/features and removal of their
usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
and various others.
5) Improve BPF instruction set documentation by adding byte swap
instructions and cleaning up load/store section, from Christoph Hellwig.
6) Switch BPF preload infra to light skeleton and remove libbpf dependency
from it, from Alexei Starovoitov.
7) Fix architecture-agnostic macros in libbpf for accessing syscall
arguments from BPF progs for non-x86 architectures,
from Ilya Leoshkevich.
8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
9) Add new bpf_copy_from_user_task() helper to read memory from a different
task than current. Add ability to create sleepable BPF iterator progs,
from Kenny Yu.
10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
11) Generate temporary netns names for BPF selftests to avoid naming
collisions, from Hangbin Liu.
12) Implement bpf_core_types_are_compat() with limited recursion for
in-kernel usage, from Matteo Croce.
13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
14) Misc minor fixes to libbpf and selftests from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
libbpf: Fix compilation warning due to mismatched printf format
selftests/bpf: Test BPF_KPROBE_SYSCALL macro
libbpf: Add BPF_KPROBE_SYSCALL macro
libbpf: Fix accessing the first syscall argument on s390
libbpf: Fix accessing the first syscall argument on arm64
libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
libbpf: Fix accessing syscall arguments on riscv
libbpf: Fix riscv register names
libbpf: Fix accessing syscall arguments on powerpc
selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
libbpf: Add PT_REGS_SYSCALL_REGS macro
selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
bpf: Fix leftover header->pages in sparc and powerpc code.
libbpf: Fix signedness bug in btf_dump_array_data()
selftests/bpf: Do not export subtest as standalone test
bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
...
====================
Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the commit c504e5c2f9 ("net: skb: introduce kfree_skb_reason()")
drop reason is introduced to the tracepoint of kfree_skb. Therefore,
drop_monitor is able to report the drop reason to users by netlink.
The drop reasons are reported as string to users, which is exactly
the same as what we do when reporting it to ftrace.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The file is not closed when ferror() fails.
Fixes: 00d674cb35 ("kconfig: refactor conf_write_dep()")
Fixes: 57ddd07c45 ("kconfig: refactor conf_write_autoconf()")
Reported-by: Ryan Cai <ycaibb@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
If no driver is attached to a device or the driver does not provide the
path_event function, an FCES path-event on this device could end up in a
kernel-panic. Verify the driver availability before the path_event
function call.
Fixes: 32ef938815 ("s390/cio: Add support for FCES status notification")
Cc: stable@vger.kernel.org
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
As reported by Jeff, dereferencing the openat2 syscall argument in
audit_match_perm() to obtain the open_how::flags can result in an
oops/page-fault. This patch fixes this by using the open_how struct
that we store in the audit_context with audit_openat2_how().
Independent of this patch, Richard Guy Briggs posted a similar patch
to the audit mailing list roughly 40 minutes after this patch was
posted.
Cc: stable@vger.kernel.org
Fixes: 1c30e3af8a ("audit: add support for the openat2 syscall")
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
it will cause hwmon node of power1_label is not created.
v2:
the hwmon node of "power1_label" is always needed for all ASICs.
and the patch will remove ASIC type check for "power1_label".
Fixes: ae07970a06 ("drm/amd/pm: add support for hwmon control of slow and fast PPT limit on vangogh")
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will
be cleared to 0 at S3 resume.
[How]
Keep eDP Vdd on when eDP stream is already enabled.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Jasdeep Dhillon <jdhillon@amd.com>
Signed-off-by: Zhan Liu <Zhan.Liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix clamping to match register field size
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Jasdeep Dhillon <jdhillon@amd.com>
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
pflip interrupt order are mapped 1 to 1 to otg id.
e.g. if irq_src=26 corresponds to otg0 then 27->otg1, 28->otg2...
Linux DM registers pflip interrupts per number of crtcs.
In fused pipe case crtc numbers can be less than otg id.
e.g. if one pipe out of 3(otg#0-2) is fused adev->mode_info.num_crtc=2
so DM only registers irq_src 26,27.
This is a bug since if pipe#2 remains unfused DM never gets
otg2 pflip interrupt (irq_src=28)
That may results in gfx failure due to pflip timeout.
[How]
Register pflip interrupts per max num of otg instead of num_crtc
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Confirmed with hardware team, there is harvesting for gc 10.3.1.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A number of BIOS versions have a problem with the watermarks table not
being configured properly. This manifests as a very scary looking warning
during resume from s0i3. This should be harmless in most cases and is well
understood, so decrease the assertion to a clearer warning about the problem.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jakub Sitnicki says:
====================
Following the recent split-up of the bpf_sock dst_port field, apply the same to
technique to the bpf_sk_lookup remote_port field to make uAPI more user
friendly.
v1 -> v2:
- Remove remote_port range check and cast to be16 in TEST_RUN for sk_lookup
(kernel test robot)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend the context access tests for sk_lookup prog to cover the surprising
case of a 4-byte load from the remote_port field, where the expected value
is actually shifted by 16 bits.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220209184333.654927-3-jakub@cloudflare.com
remote_port is another case of a BPF context field documented as a 32-bit
value in network byte order for which the BPF context access converter
generates a load of a zero-padded 16-bit integer in network byte order.
First such case was dst_port in bpf_sock which got addressed in commit
4421a58271 ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").
Loading 4-bytes from the remote_port offset and converting the value with
bpf_ntohl() leads to surprising results, as the expected value is shifted
by 16 bits.
Reduce the confusion by splitting the field in two - a 16-bit field holding
a big-endian integer, and a 16-bit zero-padding anonymous field that
follows it.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
Commit 7f7b4236f2 ("x86/PCI: Ignore E820 reservations for bridge windows
on newer systems") fixes the touchpad not working on laptops like
the Lenovo IdeaPad 3 15IIL05 and the Lenovo IdeaPad 5 14IIL05, as well as
fixing thunderbolt hotplug issues on the Lenovo Yoga C940.
Unfortunately it turns out that this is causing issues with suspend/resume
on Lenovo ThinkPad X1 Carbon Gen 2 laptops. So, per the no regressions
policy, rever this. Note I'm looking into another fix for the issues this
fixed.
Fixes: 7f7b4236f2 ("x86/PCI: Ignore E820 reservations for bridge windows on newer systems")
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2029207
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The original version of the IORT PMCG definition had an oversight
wherein there was no way to describe the second register page for an
implementation using the recommended RELOC_CTRS feature. Although the
spec was fixed, and the final patches merged to ACPICA and Linux written
against the new version, it seems that some old firmware based on the
original revision has survived and turned up in the wild.
Add a check for the original PMCG definition, and avoid filling in the
second memory resource with nonsense if so. Otherwise it is likely that
something horrible will happen when the PMCG driver attempts to probe.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Fixes: 24e5160493 ("ACPI/IORT: Add support for PMCG")
Cc: <stable@vger.kernel.org> # 5.2.x
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/75628ae41c257fb73588f7bf1c4459160e04be2b.1643916258.git.robin.murphy@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ensure that NFS clients cannot send file size or offset values that
can cause the NFS server to crash or to return incorrect or
surprising results. In particular, fix how the NFS server handles
values larger than OFFSET_MAX.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmIDzr4ACgkQM2qzM29m
f5eJRBAAikdh0PYOlZbvy9M1eY6wq3k+Y10JsnCZk4T8Uq0NJF/7CJ3R4/4+xGOh
ZA/2vE1dN4IfqnIOdxw1cXbzzgAO5p/nDLMo9wC6NimrVLkE+S8j38oWvEHOCJXC
TzUbIKkxqBBcfDw4pO4BT42iHx+cqVUuRFd2qkob1ZRoe+BKI+F4+7QNVc8iEw5z
j85i2/h6JohsItzekRbMO1q1iXxBc+IZRYafjibtVRWxRuNUWP8C1cv0eXrlSy3O
L07kZRwzrd52PAi1Q8K07Ip+yTHUMZptyHoB6S863uuz/mOzlpXewvXHMGA1btlr
POHYG/lBXpDS0e2pjksyXXp2I7HJV/HuaMyyLveWRO0qleBc3G5PsvIJNBW7xl5f
NPGdgfaa+8ZeOCGolvPruykL9Eh7QAyWTdPKz1J+NuhjkAB4p6ba9QcKVwP7kYTi
I8zdeUPgbjuFW35hal0ZIlNi2RfcuSGk1FKjotrQ6J3XNIaqPkUWK+1Zz3MzqPUW
+1ElzoXQugJASPBkEZuf1aXr8/vRjKT16l8EX1kbtJ5wjj2OPbnWWZk03ZncLVfv
CzbJTZLqiM0JuRqXvYpUGAQdryWcwvTCAuWxcqrt4ALNWW6Z4Y35Vl8H4sTh8wkr
Q3m6bAVYJx3FmFop7y5ubVH137k1SFJ0NzGJJK0mYoZQSMZoPZI=
=64n/
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd fixes from Chuck Lever:
"Ensure that NFS clients cannot send file size or offset values that
can cause the NFS server to crash or to return incorrect or surprising
results.
In particular, fix how the NFS server handles values larger than
OFFSET_MAX"
* tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Deprecate NFS_OFFSET_MAX
NFSD: Fix offset type in I/O trace points
NFSD: COMMIT operations must not return NFS?ERR_INVAL
NFSD: Clamp WRITE offsets
NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
NFSD: Fix ia_size underflow
NFSD: Fix the behavior of READ near OFFSET_MAX
Pull crypto fixes from Herbert Xu:
"Fix two regressions:
- Potential boot failure due to missing cryptomgr on initramfs
- Stack overflow in octeontx2"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: api - Move cryptomgr soft dependency into algapi
crypto: octeontx2 - Avoid stack variable overflow
Currently if we get IO error while doing send then we abort without
logging information about which file caused issue. So log it to help
with debugging.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When using the flushoncommit mount option, during almost every transaction
commit we trigger a warning from __writeback_inodes_sb_nr():
$ cat fs/fs-writeback.c:
(...)
static void __writeback_inodes_sb_nr(struct super_block *sb, ...
{
(...)
WARN_ON(!rwsem_is_locked(&sb->s_umount));
(...)
}
(...)
The trace produced in dmesg looks like the following:
[947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
[947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
[947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
[947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
[947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
[947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
[947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
[947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
[947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
[947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
[947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
[947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
[947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
[947.571072] Call Trace:
[947.572354] <TASK>
[947.573266] btrfs_commit_transaction+0x1f1/0x998
[947.576785] ? start_transaction+0x3ab/0x44e
[947.579867] ? schedule_timeout+0x8a/0xdd
[947.582716] transaction_kthread+0xe9/0x156
[947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407
[947.590104] kthread+0x131/0x139
[947.592168] ? set_kthread_struct+0x32/0x32
[947.595174] ret_from_fork+0x22/0x30
[947.597561] </TASK>
[947.598553] ---[ end trace 644721052755541c ]---
This is because we started using writeback_inodes_sb() to flush delalloc
when committing a transaction (when using -o flushoncommit), in order to
avoid deadlocks with filesystem freeze operations. This change was made
by commit ce8ea7cc6e ("btrfs: don't call btrfs_start_delalloc_roots
in flushoncommit"). After that change we started producing that warning,
and every now and then a user reports this since the warning happens too
often, it spams dmesg/syslog, and a user is unsure if this reflects any
problem that might compromise the filesystem's reliability.
We can not just lock the sb->s_umount semaphore before calling
writeback_inodes_sb(), because that would at least deadlock with
filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
is called while we are holding that semaphore in write mode, and that can
trigger a transaction commit, resulting in a deadlock. It would also
trigger the same type of deadlock in the unmount path. Possibly, it could
also introduce some other locking dependencies that lockdep would report.
To fix this call try_to_writeback_inodes_sb() instead of
writeback_inodes_sb(), because that will try to read lock sb->s_umount
and then will only call writeback_inodes_sb() if it was able to lock it.
This is fine because the cases where it can't read lock sb->s_umount
are during a filesystem unmount or during a filesystem freeze - in those
cases sb->s_umount is write locked and sync_filesystem() is called, which
calls writeback_inodes_sb(). In other words, in all cases where we can't
take a read lock on sb->s_umount, writeback is already being triggered
elsewhere.
An alternative would be to call btrfs_start_delalloc_roots() with a
number of pages different from LONG_MAX, for example matching the number
of delalloc bytes we currently have, in which case we would end up
starting all delalloc with filemap_fdatawrite_wbc() and not with an
async flush via filemap_flush() - that is only possible after the rather
recent commit e076ab2a2c ("btrfs: shrink delalloc pages instead of
full inodes"). However that creates a whole new can of worms due to new
lock dependencies, which lockdep complains, like for example:
[ 8948.247280] ======================================================
[ 8948.247823] WARNING: possible circular locking dependency detected
[ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
[ 8948.248786] ------------------------------------------------------
[ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
[ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.250638]
but task is already holding lock:
[ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.252018]
which lock already depends on the new lock.
[ 8948.252710]
the existing dependency chain (in reverse order) is:
[ 8948.253343]
-> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
[ 8948.253950] __mutex_lock+0x90/0x900
[ 8948.254354] start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.254859] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.255408] btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.255942] btrfs_mksubvol+0x380/0x570 [btrfs]
[ 8948.256406] btrfs_mksnapshot+0x81/0xb0 [btrfs]
[ 8948.256870] __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
[ 8948.257413] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[ 8948.257961] btrfs_ioctl+0x1196/0x3630 [btrfs]
[ 8948.258418] __x64_sys_ioctl+0x83/0xb0
[ 8948.258793] do_syscall_64+0x3b/0xc0
[ 8948.259146] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.259709]
-> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
[ 8948.260330] __mutex_lock+0x90/0x900
[ 8948.260692] btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.261234] btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.261766] btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
[ 8948.262379] btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
[ 8948.262909] open_ctree+0x1511/0x171e [btrfs]
[ 8948.263359] btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 8948.263863] legacy_get_tree+0x30/0x50
[ 8948.264242] vfs_get_tree+0x28/0xc0
[ 8948.264594] vfs_kern_mount.part.0+0x71/0xb0
[ 8948.265017] btrfs_mount+0x11d/0x3a0 [btrfs]
[ 8948.265462] legacy_get_tree+0x30/0x50
[ 8948.265851] vfs_get_tree+0x28/0xc0
[ 8948.266203] path_mount+0x2d4/0xbe0
[ 8948.266554] __x64_sys_mount+0x103/0x140
[ 8948.266940] do_syscall_64+0x3b/0xc0
[ 8948.267300] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.267790]
-> #0 (sb_internal#2){.+.+}-{0:0}:
[ 8948.268322] __lock_acquire+0x12e8/0x2260
[ 8948.268733] lock_acquire+0xd7/0x310
[ 8948.269092] start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.269591] find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.270087] btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.270588] cow_file_range+0x17e/0x490 [btrfs]
[ 8948.271051] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.271586] writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.272071] __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.272579] extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.273113] extent_writepages+0x76/0x130 [btrfs]
[ 8948.273573] do_writepages+0xd2/0x1c0
[ 8948.273942] filemap_fdatawrite_wbc+0x68/0x90
[ 8948.274371] start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.274876] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.275417] flush_space+0x1f2/0x630 [btrfs]
[ 8948.275863] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.276438] process_one_work+0x252/0x5a0
[ 8948.276829] worker_thread+0x55/0x3b0
[ 8948.277189] kthread+0xf2/0x120
[ 8948.277506] ret_from_fork+0x22/0x30
[ 8948.277868]
other info that might help us debug this:
[ 8948.278548] Chain exists of:
sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex
[ 8948.279601] Possible unsafe locking scenario:
[ 8948.280102] CPU0 CPU1
[ 8948.280508] ---- ----
[ 8948.280915] lock(&root->delalloc_mutex);
[ 8948.281271] lock(&fs_info->delalloc_root_mutex);
[ 8948.281915] lock(&root->delalloc_mutex);
[ 8948.282487] lock(sb_internal#2);
[ 8948.282800]
*** DEADLOCK ***
[ 8948.283333] 4 locks held by kworker/u16:18/933570:
[ 8948.283750] #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.284609] #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.285637] #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.286674] #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.287596]
stack backtrace:
[ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
[ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[ 8948.290298] Call Trace:
[ 8948.290517] <TASK>
[ 8948.290700] dump_stack_lvl+0x59/0x73
[ 8948.291026] check_noncircular+0xf3/0x110
[ 8948.291375] ? start_transaction+0x228/0x6e0 [btrfs]
[ 8948.291826] __lock_acquire+0x12e8/0x2260
[ 8948.292241] lock_acquire+0xd7/0x310
[ 8948.292714] ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.293241] ? lock_is_held_type+0xea/0x140
[ 8948.293601] start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.294055] ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294518] find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294957] ? _raw_spin_unlock+0x29/0x40
[ 8948.295312] ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
[ 8948.295813] btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.296270] cow_file_range+0x17e/0x490 [btrfs]
[ 8948.296691] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.297175] ? find_lock_delalloc_range+0x247/0x270 [btrfs]
[ 8948.297678] writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.298123] __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.298570] extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.299061] extent_writepages+0x76/0x130 [btrfs]
[ 8948.299495] do_writepages+0xd2/0x1c0
[ 8948.299817] ? sched_clock_cpu+0xd/0x110
[ 8948.300160] ? lock_release+0x155/0x4a0
[ 8948.300494] filemap_fdatawrite_wbc+0x68/0x90
[ 8948.300874] ? do_raw_spin_unlock+0x4b/0xa0
[ 8948.301243] start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.301706] ? lock_release+0x155/0x4a0
[ 8948.302055] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.302564] flush_space+0x1f2/0x630 [btrfs]
[ 8948.302970] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.303510] process_one_work+0x252/0x5a0
[ 8948.303860] ? process_one_work+0x5a0/0x5a0
[ 8948.304221] worker_thread+0x55/0x3b0
[ 8948.304543] ? process_one_work+0x5a0/0x5a0
[ 8948.304904] kthread+0xf2/0x120
[ 8948.305184] ? kthread_complete_and_exit+0x20/0x20
[ 8948.305598] ret_from_fork+0x22/0x30
[ 8948.305921] </TASK>
It all comes from the fact that btrfs_start_delalloc_roots() takes the
delalloc_root_mutex, in the transaction commit path we are holding a
read lock on one of the superblock's freeze semaphores (via
sb_start_intwrite()), the async reclaim task can also do a call to
btrfs_start_delalloc_roots(), which ends up triggering writeback with
calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
in turn can call btrfs_start_transaction(), which will result in taking
the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
on all those locks which can be taken in different orders by different
code paths.
So just adopt the simple approach of calling try_to_writeback_inodes_sb()
at btrfs_start_delalloc_flush().
Link: https://lore.kernel.org/linux-btrfs/20220130005258.GA7465@cuci.nl/
Link: https://lore.kernel.org/linux-btrfs/43acc426-d683-d1b6-729d-c6bc4a2fff4d@gmail.com/
Link: https://lore.kernel.org/linux-btrfs/6833930a-08d7-6fbc-0141-eb9cdfd6bb4d@gmail.com/
Link: https://lore.kernel.org/linux-btrfs/20190322041731.GF16651@hungrycats.org/
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[ add more link reports ]
Signed-off-by: David Sterba <dsterba@suse.com>
Once we start writeback (have called btrfs_run_delalloc_range()), we
allocate an extent, create an extent map point to that extent, with a
generation of (u64)-1, created the ordered extent and then clear the
DELALLOC bit from the range in the inode's io tree.
Such extent map can pass the first call of defrag_collect_targets(), as
its generation is (u64)-1, meets any possible minimal generation check.
And the range will not have DELALLOC bit, also passing the DELALLOC bit
check.
It will only be re-checked in the second call of
defrag_collect_targets(), which will wait for writeback.
But at that stage we have already spent our time waiting for some IO we
may or may not want to defrag.
Let's reject such extents early so we won't waste our time.
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a user report about "btrfs filesystem defrag" causing 120s
timeout problem.
For btrfs_defrag_file() it will iterate all file extents if called from
defrag ioctl, thus it can take a long time.
There is no reason not to release the CPU during such a long operation.
Add cond_resched() after defragged one cluster.
CC: stable@vger.kernel.org # 5.16
Link: https://lore.kernel.org/linux-btrfs/10e51417-2203-f0a4-2021-86c8511cc367@gmx.com
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 3ba442d533 ("fs: move binfmt_misc sysctl to its own file") did
not go unnoticed, binfmt-support stopped to work on my Debian system
since v5.17-rc2 (did not check with -rc1).
The existance of the /proc/sys/fs/binfmt_misc is a precondition for
attempting to mount the binfmt_misc fs, which in turn triggers the
autoload of the binfmt_misc module. Without it, no module is loaded and
no binfmt is available at boot.
Building as built-in or manually loading the module and mounting the fs
works fine, it's therefore only a matter of interaction with user-space.
I could try to improve the Debian systemd configuration but I can't say
anything about the other distributions.
This patch restores a working system right after boot.
Fixes: 3ba442d533 ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
expect VFs to receive all VLAN tagged traffic with a matching
destination MAC.
Add an ethtool private flag 'vf-vlan-pruning' and set the default to
off so VFs will receive all VLAN traffic directed towards them. When
the flag is turned on, VF will only be able to receive untagged
traffic or traffic with VLAN tags it has created interfaces for.
Also, the flag cannot be changed while any VFs are allocated. This was
done to simplify the implementation. So, if this flag is needed, then
the PF admin must enable it. If the user tries to enable the flag while
VFs are active, then print an unsupported message with the
vf-vlan-pruning flag included. In case multiple flags were specified, this
makes it clear to the user which flag failed.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently there is only support for 802.1Q port VLANs on SR-IOV VFs. Add
support to also allow 802.1ad port VLANs when double VLAN mode is
enabled.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In order for the driver to support 802.1ad VLAN filtering and offloads,
it needs to advertise those VLAN features and also support modifying
those VLAN features, so make the necessary changes to
ice_set_netdev_features(). By default, enable CTAG insertion/stripping
and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM).
Also, in DVM, enable STAG filtering by default. This is done by
setting the feature bits in netdev->features. Also, in DVM, support
toggling of STAG insertion/stripping, but don't enable them by
default. This is done by setting the feature bits in
netdev->hw_features.
Since 802.1ad VLAN filtering and offloads are only supported in DVM, make
sure they are not enabled by default and that they cannot be enabled
during runtime, when the device is in SVM.
Add an implementation for the ndo_fix_features() callback. This is
needed since the hardware cannot support multiple VLAN ethertypes for
VLAN insertion/stripping simultaneously and all supported VLAN filtering
must either be enabled or disabled together.
Disable inner VLAN stripping by default when DVM is enabled. If a VSI
supports stripping the inner VLAN in DVM, then it will have to configure
that during runtime. For example if a VF is configured in a port VLAN
while DVM is enabled it will be allowed to offload inner VLANs.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In order to support configuring the device in Double VLAN Mode (DVM),
the DDP and FW have to support DVM. If both support DVM, the PF that
downloads the package needs to update the default recipes, set the
VLAN mode, and update boost TCAM entries.
To support updating the default recipes in DVM, add support for
updating an existing switch recipe's lkup_idx and mask. This is done
by first calling the get recipe AQ (0x0292) with the desired recipe
ID. Then, if that is successful update one of the lookup indices
(lkup_idx) and its associated mask if the mask is valid otherwise
the already existing mask will be used.
The VLAN mode of the device has to be configured while the global
configuration lock is held while downloading the DDP, specifically after
the DDP has been downloaded. If supported, the device will default to
DVM.
Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for the VF driver to be able to request
VIRTCHNL_VF_OFFLOAD_VLAN_V2, negotiate its VLAN capabilities via
VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, add/delete VLAN filters, and
enable/disable VLAN offloads.
VFs supporting VIRTCHNL_OFFLOAD_VLAN_V2 will be able to use the
following virtchnl opcodes:
VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
VIRTCHNL_OP_ADD_VLAN_V2
VIRTCHNL_OP_DEL_VLAN_V2
VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2
VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2
VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2
VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2
Legacy VF drivers may expect the initial VLAN stripping settings to be
configured by the PF, so the PF initializes VLAN stripping based on the
VIRTCHNL_OP_GET_VF_RESOURCES opcode. However, with VLAN support via
VIRTCHNL_VF_OFFLOAD_VLAN_V2, this function is only expected to be used
for VFs that only support VIRTCHNL_VF_OFFLOAD_VLAN, which will only
be supported when a port VLAN is configured. Update the function
based on the new expectations. Also, change the message when the PF
can't enable/disable VLAN stripping to a dev_dbg() as this isn't fatal.
When a VF isn't in a port VLAN and it only supports
VIRTCHNL_VF_OFFLOAD_VLAN when Double VLAN Mode (DVM) is enabled, then
the PF needs to reject the VIRTCHNL_VF_OFFLOAD_VLAN capability and
configure the VF in software only VLAN mode. To do this add the new
function ice_vf_vsi_cfg_legacy_vlan_mode(), which updates the VF's
inner and outer ice_vsi_vlan_ops functions and sets up software only
VLAN mode.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the driver only supports 802.1Q VLAN insertion and stripping.
However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q
and 802.1ad VLAN insertion and stripping will be supported. Unfortunately
the VSI context parameters only allow for one VLAN ethertype at a time
for VLAN offloads so only one or the other VLAN ethertype offload can be
supported at once.
To support this, multiple changes are needed.
Rx path changes:
[1] In DVM, the Rx queue context l2tagsel field needs to be cleared so
the outermost tag shows up in the l2tag2_2nd field of the Rx flex
descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain
1 to support SVM configurations.
[2] Modify the ice_test_staterr() function to take a __le16 instead of
the ice_32b_rx_flex_desc union pointer so this function can be used for
both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.
[3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that
checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.
[4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX
is enabled in netdev->features. If it is, then this is the VLAN
ethertype that needs to be added to the stripping VLAN tag. Since
ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled
simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.
Tx path changes:
[1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx
context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was
added to the list of tx_flags to handle this case.
[2] When the stack requests the VLAN tag to be offloaded on Tx, the
driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or
ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1
respectively. To determine which location to use, set a bit in the Tx
ring flags field during ring allocation that can be used to determine
which field to use in the Tx descriptor. In DVM, always use l2tag2,
and in SVM, always use l2tag1.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
ops are only available when the device is in Double VLAN Mode (DVM).
Depending on the VSI type, the requirements for what operations to
use/allow differ.
By default all VSI's have unsupported inner and outer VSI VLAN ops. This
implementation was chosen to prevent unexpected crashes due to null
pointer dereferences. Instead, if a VSI calls an unsupported op, it will
just return -EOPNOTSUPP.
Add implementations to support modifying outer VLAN fields for VSI
context. This includes the ability to modify VLAN stripping, insertion,
and the port VLAN based on the outer VLAN handling fields of the VSI
context.
These functions should only ever be used if DVM is enabled because that
means the firmware supports the outer VLAN fields in the VSI context. If
the device is in DVM, then always use the outer_vlan_ops, else use the
vlan_ops since the device is in Single VLAN Mode (SVM).
Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
ice_vsi_vlan_setup() as the latter function is specific to the PF and
all other VSI types that need an untagged VLAN 0 filter already do this
in their specific flows. Without this change, Flow Director is failing
to initialize because it does not implement any VSI VLAN ops.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current operations act on inner VLAN fields. To support double VLAN, outer
VLAN operations and functions will be implemented. Add the "inner" naming
to existing VLAN operations to distinguish them from the upcoming outer
values and functions. Some spacing adjustments are made to align
values.
Note that the inner is not talking about a tunneled VLAN, but the second
VLAN in the packet. For SVM the driver uses inner or single VLAN
filtering and offloads and in Double VLAN Mode the driver uses the
inner filtering and offloads for SR-IOV VFs in port VLANs in order to
support offloading the guest VLAN while a port VLAN is configured.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the proto argument is unused. This is because the driver only
supports 802.1Q VLAN filtering. This policy is enforced via netdev
features that the driver sets up when configuring the netdev, so the
proto argument won't ever be anything other than 802.1Q. However, this
will allow for future iterations of the driver to seemlessly support
802.1ad filtering. Begin using the proto argument and extend the related
structures to support its use.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The current vf->port_vlan_info variable is a packed u16 that contains
the port VLAN ID and QoS/prio value. This is fine, but changes are
incoming that allow for an 802.1ad port VLAN. Add flexibility by
changing the vf->port_vlan_info member to be an ice_vlan structure.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a new struct for VLAN related information. Currently this holds
VLAN ID and priority values, but will be expanded to hold TPID value.
This reduces the changes necessary if any other values are added in
future. Remove the action argument from these calls as it's always
ICE_FWD_VSI.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
offloads require more flexibility when configuring VLANs. The VSI VLAN
interface will allow flexibility for configuring VLANs for all VSI
types. Add new files to separate the VSI VLAN ops and move functions to
make the code more organized.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There are multiple places where VLAN 0 is being added. Create a function
to be called in order to minimize changes as the implementation is expanded
to support double VLAN and avoid duplicated code.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add functions to configure Tx VLAN antispoof based on iproute
configuration and/or VLAN mode and VF driver support. This is needed
later so the driver can control when it can be configured. Also, add
functions that can be used to enable and disable MAC and VLAN
spoofcheck. Move spoofchk configuration during VSI setup into the
SR-IOV initialization path and into the post VSI rebuild flow for VF
VSIs.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The SIDA MEMOPs must only be used for secure guests, otherwise userspace
can do unwanted memory accesses.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+SKTgaM0CPnbq/vKEXu8gLWmHHwFAmH6S9MACgkQEXu8gLWm
HHwpjxAAqPo76yBxTsSp0gLmczY07F2J4OgJtkHkvqPss2X3pfZuEgfyzr40dzyB
7uC/q+l09/dSFhKhHRcxAR/LB5m+zGagwJQFOBbXI/Rn5AfKlQksDwstGL4Y+f1e
93feXq824HZ5d7tpyIBKbxdP09u5mtL1zswryUSjwGuJQntNmi+lXqFU1FUv+LmW
e6MxHLyuPZlIIyHN3I7CeRBURwF/V1dUKmBNRPFu7sZ0FcFHKvafMwsiiMPu/ImB
bOQW5pPKFxrOjY6Zk6iOL/nkqCRDOt9rBIqpRBatzINANrDuaCXEjklZO5Owuc4z
r3KTPauj/yZGPeV6eXTdpcbocMFnDSrGkMNQNMyqGLCu+s3BxRek7EkV0Q81U89D
7CixloS4ZGPOIkXyhii772DelsBX5mxe1AtW9BA+Dcjr6SbvGItVP3pgwOMJiZrm
miJgVLmCd7sN+ZgKUdA1HUjIpNPG8lmoGsDasDY2irBQI8DNTJrTPAHSDa2nTTcF
wFpOhPy0lxcyHmXuqFFVIPx8AWLdnUAB0U8uNr5XpO6xonmVEgEaG+wgWlLkfjJa
vTEdl7yirYvgiZ89BHrSYLY760IuNJFbmQVIeYBfoTNwQJ9WyjUmA6rFijokR8pH
lHI9HdQvbW5HGcCx6HykJUQz1i6tX8Ir3OjRfDjDaIbmuxS6J5o=
=PBt2
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-kernel-access' from emailed bundle
Pull s390 kvm fix from Christian Borntraeger:
"Add missing check for the MEMOP ioctl
The SIDA MEMOPs must only be used for secure guests, otherwise
userspace can do unwanted memory accesses"
* tag 'kvm-s390-kernel-access' from emailed bundle:
KVM: s390: Return error on SIDA memop on normal guest
DRM bridge drivers are now attaching their DSI device at probe time,
which requires us to register our DSI host in order to let the bridge
to probe: this recently started producing an endless -EPROBE_DEFER
loop on some machines that are using external bridges, like the
parade-ps8640, found on the ACER Chromebook R13.
Now that the DSI hosts/devices probe sequence is documented, we can
do adjustments to the mtk_dsi driver as to both fix now and make sure
to avoid this situation in the future: for this, following what is
documented in drm_bridge.c, move the mtk_dsi component_add() to the
mtk_dsi_ops.attach callback and delete it in the detach callback;
keeping in mind that we are registering a drm_bridge for our DSI,
which is only used/attached if the DSI Host is bound, it wouldn't
make sense to keep adding our bridge at probe time (as it would
be useless to have it if mtk_dsi_ops.attach() fails!), so also move
that one to the dsi host attach function (and remove it in detach).
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Jagan Teki <jagan@amarulasolutions.com>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NFSv3 and NFSv4 use u64 offset values on the wire. Record these values
verbatim without the implicit type case to loff_t.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>