Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.
No function change intended.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:
- The name suggests it encodes the size of the page rather than the
level.
- In other places in x86_64/processor.c we just use a raw int to encode
the level.
Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
around raw ints when referring to the level. This makes the code easier
to understand since these macros are very common in KVM MMU code.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0
doesn't intercept PAUSE") introduced passthrough support for nested pause
filtering, (when the host doesn't intercept PAUSE) (either disabled with
kvm module param, or disabled with '-overcommit cpu-pm=on')
Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
the feature was exposed as supported by KVM cpuid unconditionally, thus
if L1 could try to use it even when the L0 KVM can't really support it.
In this case the fallback caused KVM to intercept each PAUSE instruction;
in some cases, such intercept can slow down the nested guest so much
that it can fail to boot. Instead, before the problematic commit KVM
was already setting both thresholds to 0 in vmcb02, but after the first
userspace VM exit shrink_ple_window was called and would reset the
pause_filter_count to the default value.
To fix this, change the fallback strategy - ignore the guest threshold
values, but use/update the host threshold values unless the guest
specifically requests disabling PAUSE filtering (either simple or
advanced).
Also fix a minor bug: on nested VM exit, when PAUSE filter counter
were copied back to vmcb01, a dirty bit was not set.
Thanks a lot to Suravee Suthikulpanit for debugging this!
Fixes: 74fd41ed16 ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that these functions are always called with preemption disabled,
remove the preempt_disable()/preempt_enable() pair inside them.
No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On SVM, if preemption happens right after the call to finish_rcuwait
but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
will re-enable AVIC, and then we will try to re-enable it again
in kvm_arch_vcpu_unblocking which will lead to a warning
in __avic_vcpu_load.
The same problem can happen if the vCPU is preempted right after the call
to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
and in this case, we will end up with AVIC enabled during sleep -
Ooops.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently nothing prevents preemption in kvm_vcpu_update_apicv.
On SVM, If the preemption happens after we update the
vcpu->arch.apicv_active, the preemption itself will
'update' the inhibition since the AVIC will be first disabled
on vCPU unload and then enabled, when the current task
is loaded again.
Then we will try to update it again, which will lead to a warning
in __avic_vcpu_load, that the AVIC is already enabled.
Fix this by disabling preemption in this code.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are two issues in avic_kick_target_vcpus_fast
1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
which must be treated as a broadcast destination.
Fix this by explicitly checking for it.
Also don’t use ‘index’ in this case as it gives no new information.
2. It is legal to issue a logical IPI request to more than one target.
Index field only provides index in physical id table of first
such target and therefore can't be used before we are sure
that only a single target was addressed.
Instead, parse the ICRL/ICRH, double check that a unicast interrupt
was requested, and use that info to figure out the physical id
of the target vCPU.
At that point there is no need to use the index field as well.
In addition to fixing the above issues, also skip the call to
kvm_apic_match_dest.
It is possible to do this now, because now as long as AVIC is not
inhibited, it is guaranteed that none of the vCPUs changed their
apic id from its default value.
This fixes boot of windows guest with AVIC enabled because it uses
IPI with 0xFF destination and no destination shorthand.
Fixes: 7223fd2d53 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AVIC is now inhibited if the guest changes the apic id,
and therefore this code is no longer needed.
There are several ways this code was broken, including:
1. a vCPU was only allowed to change its apic id to an apic id
of an existing vCPU.
2. After such change, the vCPU whose apic id entry was overwritten,
could not correctly change its own apic id, because its own
entry is already overwritten.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Neither of these settings should be changed by the guest and it is
a burden to support it in the acceleration code, so just inhibit
this code instead.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These days there are too many AVIC/APICv inhibit
reasons, and it doesn't hurt to have some documentation
for them.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Assign shadow_me_value, not shadow_me_mask, to PAE root entries,
a.k.a. shadow PDPTRs, when host memory encryption is supported. The
"mask" is the set of all possible memory encryption bits, e.g. MKTME
KeyIDs, whereas "value" holds the actual value that needs to be
stuffed into host page tables.
Using shadow_me_mask results in a failed VM-Entry due to setting
reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to
physical addresses with non-zero MKTME bits sending to_shadow_page()
into the weeds:
set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
BUG: unable to handle page fault for address: ffd43f00063049e8
PGD 86dfd8067 P4D 0
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm]
kvm_mmu_free_roots+0xd1/0x200 [kvm]
__kvm_mmu_unload+0x29/0x70 [kvm]
kvm_mmu_unload+0x13/0x20 [kvm]
kvm_arch_destroy_vm+0x8a/0x190 [kvm]
kvm_put_kvm+0x197/0x2d0 [kvm]
kvm_vm_release+0x21/0x30 [kvm]
__fput+0x8e/0x260
____fput+0xe/0x10
task_work_run+0x6f/0xb0
do_exit+0x327/0xa90
do_group_exit+0x35/0xa0
get_signal+0x911/0x930
arch_do_signal_or_restart+0x37/0x720
exit_to_user_mode_prepare+0xb2/0x140
syscall_exit_to_user_mode+0x16/0x30
do_syscall_64+0x4e/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: e54f1ff244 ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220608012015.19566-1-yuan.yao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending
state of a HW interrupt from userspace (and make the code
common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmKh/2cPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDNKAQALlEZytemZk/R+uqkt85+vn4NPPlQdxBIYPZ
9ftqHwd0k3MvXtT9+DvTyBgVaR24TNp0ofO0mIYYQZbT4Kr+1LjKu9Jpxj0F8pjV
HllS8bWsFPHeUlN7UWE7SPiheQkH00xrtVYZH8aY2F/X05H1hO+dxcKbzSdmKeX4
JgNhFKYPrYgut5x0nQcs/SU04KfHYPwUC0MeMq+kI9dqiqX0OeJxWl2QxFIazxBg
kmPBAY7570DUi7u4wzmA9IHyk2a/WORil3Hp6oujCQqz8/KhwdNZEOMvjlwqrdLn
q1/RN9Z2RgjtmOPVvzURwKpOahXS5F+x62CTHADtcLF9SBBKi+GE7ZDYNkMq5YRt
xyu2PRHwcQPhfi1njzw4OlPreScSOpPkGV6MQZwEykA06poq6d0Qwt/5qr/UGbF2
hZUKh6cseZmvi2+ZM/vp2S7WF5nXUBS8yZBFBTEAPv77lsJBUFM4ruy1JDhlyMit
c8aGUmXFGYcEk85jIePNyWwaEQRcvt976GSwhMqOnFXAc+XoyGmwDKaPoS8HQ7PV
FuUTuDfYtuhkN1OoUsIasg1/hDqkvC4edURLEScYWvotf0HXzS4kb8EWzP9Wd83c
O105bl+uySzovJuFPMLpky7KYBvvN6FgQI+a/tQXs7dq2NI/+Kn8TWUyOQADJgTD
rI3eXFcM
=qYdB
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.19, take #1
- Properly reset the SVE/SME flags on vcpu load
- Fix a vgic-v2 regression regarding accessing the pending
state of a HW interrupt from userspace (and make the code
common with vgic-v3)
- Fix access to the idreg range for protected guests
- Ignore 'kvm-arm.mode=protected' when using VHE
- Return an error from kvm_arch_init_vm() on allocation failure
- A bunch of small cleanups (comments, annotations, indentation)
The ptrace PEEKUSR/POKEUSR (aka PEEKUSER/POKEUSER) API allows a process
to read/write registers of another process.
To get/set a register, the API takes an index into an imaginary address
space called the "USER area", where the registers of the process are
laid out in some fashion.
The kernel then maps that index to a particular register in its own data
structures and gets/sets the value.
The API only allows a single machine-word to be read/written at a time.
So 4 bytes on 32-bit kernels and 8 bytes on 64-bit kernels.
The way floating point registers (FPRs) are addressed is somewhat
complicated, because double precision float values are 64-bit even on
32-bit CPUs. That means on 32-bit kernels each FPR occupies two
word-sized locations in the USER area. On 64-bit kernels each FPR
occupies one word-sized location in the USER area.
Internally the kernel stores the FPRs in an array of u64s, or if VSX is
enabled, an array of pairs of u64s where one half of each pair stores
the FPR. Which half of the pair stores the FPR depends on the kernel's
endianness.
To handle the different layouts of the FPRs depending on VSX/no-VSX and
big/little endian, the TS_FPR() macro was introduced.
Unfortunately the TS_FPR() macro does not take into account the fact
that the addressing of each FPR differs between 32-bit and 64-bit
kernels. It just takes the index into the "USER area" passed from
userspace and indexes into the fp_state.fpr array.
On 32-bit there are 64 indexes that address FPRs, but only 32 entries in
the fp_state.fpr array, meaning the user can read/write 256 bytes past
the end of the array. Because the fp_state sits in the middle of the
thread_struct there are various fields than can be overwritten,
including some pointers. As such it may be exploitable.
It has also been observed to cause systems to hang or otherwise
misbehave when using gdbserver, and is probably the root cause of this
report which could not be easily reproduced:
https://lore.kernel.org/linuxppc-dev/dc38afe9-6b78-f3f5-666b-986939e40fc6@keymile.com/
Rather than trying to make the TS_FPR() macro even more complicated to
fix the bug, or add more macros, instead add a special-case for 32-bit
kernels. This is more obvious and hopefully avoids a similar bug
happening again in future.
Note that because 32-bit kernels never have VSX enabled the code doesn't
need to consider TS_FPRWIDTH/OFFSET at all. Add a BUILD_BUG_ON() to
ensure that 32-bit && VSX is never enabled.
Fixes: 87fec0514f ("powerpc: PTRACE_PEEKUSR/PTRACE_POKEUSER of FPR registers in little endian builds")
Cc: stable@vger.kernel.org # v3.13+
Reported-by: Ariel Miculas <ariel.miculas@belden.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220609133245.573565-1-mpe@ellerman.id.au
This reverts commit 3380557fc7.
It turned out this "4-byte" ID might have been an honest mistake.
Regrettably, the chip Andreas has might be a counterfeit or is
damaged in some other way and shouldn't have ended up in a router.
Andreas reported his chip is returning just four bytes:
"98 f1 80 15 00 00 00 00".
However, according to Kioxia/Toshiba's datasheet, there should
have been at least another byte that would have contained the
correct OOB size that Andreas needed.
Miquel and Andreas are both favoring reverting the patch over
further, possibly hacky modifications:
"[Reverting] is the safest option here. Apart from this device, we
do not know how many devices have these damaged/counterfeit chips.
If it is just a couple and only on Fritzboxes, as suggested in the
Github issue the patch could be carried through OpenWrt[...]"
Thanks to several users on the openwrt forum and github issue,
who stayed along for the ride:
- Peter-vdL for reporting the issue and testing patches.
- neg2led and Hannu Nyman who did all the
datasheet digging and debugging.
Cc: Andreas Boehler <dev@aboehler.at>
Suggested-by: Andreas Boehler <dev@aboehler.at>
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://github.com/openwrt/openwrt/issues/9962
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220607185918.1048204-1-chunkeey@gmail.com
In order for a file to access its own directory entry set,
exfat_inode_info(ei) has two copied values. One is ei->dir, which is
a snapshot of exfat_chain of the parent directory, and the other is
ei->entry, which is the offset of the start of the directory entry set
in the parent directory.
Since the parent directory can be updated after the snapshot point,
it should be used only for accessing one's own directory entry set.
However, as of now, during renaming, it could try to traverse or to
allocate clusters via snapshot values, it does not make sense.
This potential problem has been revealed when exfat_update_parent_info()
was removed by commit d8dad2588a ("exfat: fix referencing wrong parent
directory information after renaming"). However, I don't think it's good
idea to bring exfat_update_parent_info() back.
Instead, let's use the updated exfat_chain of parent directory diectly.
Fixes: d8dad2588a ("exfat: fix referencing wrong parent directory information after renaming")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
The layout of 'struct kvm_vcpu_arch' has evolved significantly since
the initial port of KVM/arm64, so remove the stale comment suggesting
that a prefix of the structure is used exclusively from assembly code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-7-will@kernel.org
host_stage2_try() asserts that the KVM host lock is held, so there's no
need to duplicate the assertion in its wrappers.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-6-will@kernel.org
has_vhe() expands to a compile-time constant when evaluated from the VHE
or nVHE code, alternatively checking a static key when called from
elsewhere in the kernel. On face value, this looks like a case of
premature optimization, but in fact this allows symbol references on
VHE-specific code paths to be dropped from the nVHE object.
Expand the comment in has_vhe() to make this clearer, hopefully
discouraging anybody from simplifying the code.
Cc: David Brazdil <dbrazdil@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-5-will@kernel.org
Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.
Cc: David Brazdil <dbrazdil@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
A protected VM accessing ID_AA64ISAR2_EL1 gets punished with an UNDEF,
while it really should only get a zero back if the register is not
handled by the hypervisor emulation (as mandated by the architecture).
Introduce all the missing ID registers (including the unallocated ones),
and have them to return 0.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-3-will@kernel.org
If we fail to allocate the 'supported_cpus' cpumask in kvm_arch_init_vm()
then be sure to return -ENOMEM instead of success (0) on the failure
path.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220609121223.2551-2-will@kernel.org
Fix libbpf's bpf_program__attach_uprobe() logic of determining
function's *file offset* (which is what kernel is actually expecting)
when attaching uprobe/uretprobe by function name. Previously calculation
was determining virtual address offset relative to base load address,
which (offset) is not always the same as file offset (though very
frequently it is which is why this went unnoticed for a while).
Fixes: 433966e3ae ("libbpf: Support function name-based attach uprobes")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Riham Selim <rihams@fb.com>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220606220143.3796908-1-andrii@kernel.org
This change adjusts the Makefile to use "HOSTAR" as the archive tool
to keep the sanity of the build process for the bootstrap part in
check. For the rationale, please continue reading.
When cross compiling bpftool with buildroot, it leads to an invocation
like:
$ AR="/path/to/buildroot/host/bin/arc-linux-gcc-ar" \
CC="/path/to/buildroot/host/bin/arc-linux-gcc" \
...
make
Which in return fails while building the bootstrap section:
----------------------------------8<----------------------------------
make: Entering directory '/src/bpftool-v6.7.0/src'
... libbfd: [ on ]
... disassembler-four-args: [ on ]
... zlib: [ on ]
... libcap: [ OFF ]
... clang-bpf-co-re: [ on ] <-- triggers bootstrap
.
.
.
LINK /src/bpftool-v6.7.0/src/bootstrap/bpftool
/usr/bin/ld: /src/bpftool-v6.7.0/src/bootstrap/libbpf/libbpf.a:
error adding symbols: archive has no index; run ranlib
to add one
collect2: error: ld returned 1 exit status
make: *** [Makefile:211: /src/bpftool-v6.7.0/src/bootstrap/bpftool]
Error 1
make: *** Waiting for unfinished jobs....
AR /src/bpftool-v6.7.0/src/libbpf/libbpf.a
make[1]: Leaving directory '/src/bpftool-v6.7.0/libbpf/src'
make: Leaving directory '/src/bpftool-v6.7.0/src'
---------------------------------->8----------------------------------
This occurs because setting "AR" confuses the build process for the
bootstrap section and it calls "arc-linux-gcc-ar" to create and index
"libbpf.a" instead of the host "ar".
Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/8d297f0c-cfd0-ef6f-3970-6dddb3d9a87a@synopsys.com
Ronak Doshi says:
====================
vmxnet3: upgrade to version 7
vmxnet3 emulation has recently added several new features including
support for uniform passthrough(UPT). To make UPT work vmxnet3 has
to be enhanced as per the new specification. This patch series
extends the vmxnet3 driver to leverage these new features.
Compatibility is maintained using existing vmxnet3 versioning mechanism as
follows:
- new features added to vmxnet3 emulation are associated with new vmxnet3
version viz. vmxnet3 version 7.
- emulation advertises all the versions it supports to the driver.
- during initialization, vmxnet3 driver picks the highest version number
supported by both the emulation and the driver and configures emulation
to run at that version.
In particular, following changes are introduced:
Patch 1:
This patch introduces utility macros for vmxnet3 version 7 comparison
and updates Copyright information.
Patch 2:
This patch adds new capability registers to fine control enablement of
individual features based on emulation and passthrough.
Patch 3:
This patch adds support for large passthrough BAR register.
Patch 4:
This patch adds support for out of order rx completion processing.
Patch 5:
This patch introduces new command to set ring buffer sizes to pass this
information to the hardware.
Patch 6:
For better performance, hardware has a requirement to limit number of TSO
descriptors. This patch adds that support.
Patch 7:
With vmxnet3 version 7, new descriptor fields are used to indicate
encapsulation offload.
Patch 8:
With all vmxnet3 version 7 changes incorporated in the vmxnet3 driver,
with this patch, the driver can configure emulation to run at vmxnet3
version 7.
Changes in v2->v3:
- use correct byte ordering for ringBufSize
Changes in v2:
- use local rss_fields variable for the rss capability checks in patch 2
====================
Link: https://lore.kernel.org/r/20220608032353.964-1-doshir@vmware.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With all vmxnet3 version 7 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 7, provided
the emulation advertises support for version 7.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Till vmxnet3 version 6, om field of transmit descriptor was used
to indicate encapsulated offload packet and msscof was used to
indirectly indicate TSO/CSO. From version 7 and later, ext1 field
will be used to indicate whether packet is encapsulated or not and
om fields will continue to indicate if the packet is TSO or CSO.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, vmxnet3 does not have a limit on number of descriptors
used for a TSO packet. However, with UPT, for hardware performance
reasons, this patch limits the number of transmit descriptors to 24
for a TSO packet.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch adds a new command to set ring buffer sizes. This is
required to pass the buffer size information to passthrough devices.
For performance reasons, with version7 and later, ring1 will contain
only mtu size buffers (bound to 3K). Packets > 3K will use both ring1
and ring2.
Also, ring sizes are round down to power of 2 and ring2 default
size is increased to 512.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, vmxnet3 processes rx completions in-order i.e. no
out of order completion descriptor is expected. With UPT, if
hardware supports LRO, then hardware can report out of order
rx completions. This patch enhances vmxnet3 to add this support.
This supports gets effective only when the corresponding feature
bit is set.
Also, minor enhancements are done for performance.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For vmxnet3 to work in UPT mode, the BAR sizes have been increased.
The PT page has been extended to 2 pages and also includes OOB pages
as a part of PT BAR. This patch enhances vmxnet3 to use appropriate
BAR offsets based on the capability registered. To use new offsets,
VMXNET3_CAP_LARGE_BAR needs to be set by the device. If it is not set
then the device will use legacy PT page layout.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch enhances vmxnet3 to suuport capability registers which
allows it to enable features selectively. The DCR register tracks
the capabilities vmxnet3 device supports. The PTCR register states
the capabilities that the passthrough device supports.
With the help of these registers, vmxnet3 can enable only those
features which the passthrough device supoprts. This allows
smooth trasition to Uniform-Passthrough (UPT) mode if the virtual
nic requests it. If PTCR register returns nothing or error it means
UPT is not being requested and vnic will continue in emulation mode.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
vmxnet3 is currently at version 6 and this patch initiates the
preparation to accommodate changes for upto version 7. Introduced
utility macros for vmxnet3 version 7 comparison and update Copyright
information.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* Convert IRQ chips in Diolan and Intel GPIO drivers to be immutable
The following is an automated git shortlog grouped by driver:
crystalcove:
- Join function declarations and long lines
- Use specific type and API for IRQ number
- make irq_chip immutable
dln2:
- make irq_chip immutable
merrifield:
- make irq_chip immutable
sch:
- make irq_chip immutable
wcove:
- make irq_chip immutable
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSu93Raj3rZDNXzGZv7cr9lmVa5zAUCYqCTOQAKCRD7cr9lmVa5
zFpMAQC//bjzu/ZKCSBTnfLjXsNjje1G8yOuKxkljsUR8rMc5AD/VXJGijZCkfA4
QVALyw+7yisUPCzPzj+HFvCHTZNltQQ=
=tclG
-----END PGP SIGNATURE-----
Merge tag 'intel-gpio-v5.19-2' of gitolite.kernel.org:pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current
intel-gpio for v5.19-2
* Convert IRQ chips in Diolan and Intel GPIO drivers to be immutable
Systems with AST graphics can have multiple output; typically VGA
plus some other port. Record detected output chips in a bitmask and
initialize each output on its own.
Assume a VGA output by default and use SIL164 and DP501 if available.
For ASTDP assume that it can run in parallel with VGA.
Tested on AST2100.
v3:
* define a macro for each BIT(ast_tx_chip) (Patrik)
v2:
* make VGA/SIL164/DP501 mutually exclusive
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Fixes: a59b026419 ("drm/ast: Initialize encoder and connector for VGA in helper function")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220607092008.22123-2-tzimmermann@suse.de
(cherry picked from commit 7f35680ada)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
The revision of the imx-sdma IP that is in the i.MX8M series is the
same is that as that in the i.MX7 series but the imx7d MODULE_FIRMWARE
directive is wrapped in a condiditional which means it's not defined
when built for aarch64 SOC_IMX8M platforms and hence you get the
following errors when the driver loads on imx8m devices:
imx-sdma 302c0000.dma-controller: Direct firmware load for imx/sdma/sdma-imx7d.bin failed with error -2
imx-sdma 302c0000.dma-controller: external firmware not found, using ROM firmware
Add the SOC_IMX8M into the check so the firmware can load on i.MX8.
Fixes: 1474d48bd6 ("arm64: dts: imx8mq: Add SDMA nodes")
Fixes: 941acd566b ("dmaengine: imx-sdma: Only check ratio on parts that support 1:1")
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Cc: stable@vger.kernel.org # v5.2+
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20220606161034.3544803-1-pbrobinson@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
This reverts commit a8facc7b98 ("dmaengine: add verification of
DMA_INTERRUPT capability for dmatest") as it causes regression due to
the fact that DMA_INTERRUPT in linked to dma_prep_interrupt() so
checking that is incorrect here
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220606174906.3979283-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
of_find_device_by_node() takes reference, we should use put_device()
to release it when not need anymore.
Fixes: a074ae38f8 ("dmaengine: Add driver for TI DMA crossbar on DRA7x")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Link: https://lore.kernel.org/r/20220605042723.17668-1-linmq006@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not needed anymore.
Add missing of_node_put() in to fix this.
Fixes: ec9bfa1e1a ("dmaengine: ti-dma-crossbar: dra7: Use bitops instead of idr")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220605042723.17668-2-linmq006@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
This patch makes get_vq_group and set_group_asid optional. This is
needed to unbreak the vDPA parent that doesn't support multiple
address spaces.
Cc: Gautam Dawar <gautam.dawar@xilinx.com>
Fixes: aaca8373c4 ("vhost-vdpa: support ASID based IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220609041901.2029-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are double "the" in message in file virtio_mmio.c
and virtio_pci_modern_dev.c, fix it.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Message-Id: <20220609031106.2161-1-liubo03@inspur.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>