Commit graph

824682 commits

Author SHA1 Message Date
Sean Christopherson
152482580a KVM: Call kvm_arch_memslots_updated() before updating memslots
kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.

Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots.  Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.

Fixes: e59dbe09f8 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:32 +01:00
Ben Gardon
4183683918 kvm: vmx: Add memcg accounting to KVM allocations
There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
	introduced no new failures.
	Ran a kernel memory accounting test which creates a VM to touch
	memory and then checks that the kernel memory allocated for the
	process is within certain bounds.
	With this patch we account for much more of the vmalloc and slab memory
	allocated for the VM.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:31 +01:00
Ben Gardon
1ec696470c kvm: svm: Add memcg accounting to KVM allocations
There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
	introduced no new failures.
	Ran a kernel memory accounting test which creates a VM to touch
	memory and then checks that the kernel memory allocated for the
	process is within certain bounds.
	With this patch we account for much more of the vmalloc and slab memory
	allocated for the VM.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:31 +01:00
Ben Gardon
254272ce65 kvm: x86: Add memcg accounting to KVM allocations
There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
	introduced no new failures.
	Ran a kernel memory accounting test which creates a VM to touch
	memory and then checks that the kernel memory allocated for the
	process is within certain bounds.
	With this patch we account for much more of the vmalloc and slab memory
	allocated for the VM.

There remain a few allocations which should be charged to the VM's
cgroup but are not. In x86, they include:
	vcpu->arch.pio_data
There allocations are unaccounted in this patch because they are mapped
to userspace, and accounting them to a cgroup causes problems. This
should be addressed in a future patch.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:30 +01:00
Ben Gardon
b12ce36a43 kvm: Add memcg accounting to KVM allocations
There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
	Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
	introduced no new failures.
	Ran a kernel memory accounting test which creates a VM to touch
	memory and then checks that the kernel memory allocated for the
	process is within certain bounds.
	With this patch we account for much more of the vmalloc and slab memory
	allocated for the VM.

There remain a few allocations which should be charged to the VM's
cgroup but are not. In they include:
        vcpu->run
        kvm->coalesced_mmio_ring
There allocations are unaccounted in this patch because they are mapped
to userspace, and accounting them to a cgroup causes problems. This
should be addressed in a future patch.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:29 +01:00
Paolo Bonzini
359a6c3ddc KVM: nVMX: do not start the preemption timer hrtimer unnecessarily
The preemption timer can be started even if there is a vmentry
failure during or after loading guest state.  That is pointless,
move the call after all conditions have been checked.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:29 +01:00
Yu Zhang
d92935979a kvm: vmx: Fix typos in vmentry/vmexit control setting
Previously, 'commit f99e3daf94 ("KVM: x86: Add Intel PT
virtualization work mode")' work mode' offered framework
to support Intel PT virtualization. However, the patch has
some typos in vmx_vmentry_ctrl() and vmx_vmexit_ctrl(), e.g.
used wrong flags and wrong variable, which will cause the
VM entry failure later.

Fixes: 'commit f99e3daf94 ("KVM: x86: Add Intel PT virtualization work mode")'
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:28 +01:00
Paolo Bonzini
b4b65b5642 KVM: x86: cleanup freeing of nested state
Ensure that the VCPU free path goes through vmx_leave_nested and
thus nested_vmx_vmexit, so that the cancellation of the timer does
not have to be in free_nested.  In addition, because some paths through
nested_vmx_vmexit do not go through sync_vmcs12, the cancellation of
the timer is moved there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:27 +01:00
Luwei Kang
81b016676e KVM: x86: Sync the pending Posted-Interrupts
Some Posted-Interrupts from passthrough devices may be lost or
overwritten when the vCPU is in runnable state.

The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state
but not running on physical CPU). If a posted interrupt coming at this
time, the irq remmaping facility will set the bit of PIR (Posted
Interrupt Requests) without ON (Outstanding Notification).
So this interrupt can't be sync to APIC virtualization register and
will not be handled by Guest because ON is zero.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
[Eliminate the pi_clear_sn fast path. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:27 +01:00
Liu Jingqi
c029b5deb0 KVM: x86: expose MOVDIR64B CPU feature into VM.
MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

Availability of the MOVDIR64B instruction is indicated by the presence
of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

This patch exposes the movdir64b feature to the guest.

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Cc: Xu Tao <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:26 +01:00
Liu Jingqi
74f2370bb6 KVM: x86: expose MOVDIRI CPU feature into VM.
MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

This patch exposes the movdiri feature to the guest.

The release document ref below link:
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Cc: Xu Tao <tao3.xu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:26 +01:00
Kai Huang
8acc0993e3 kvm, x86, mmu: Use kernel generic dynamic physical address mask
AMD's SME/SEV is no longer the only case which reduces supported
physical address bits, since Intel introduced Multi-key Total Memory
Encryption (MKTME), which repurposes high bits of physical address as
keyID, thus effectively shrinks supported physical address bits. To
cover both cases (and potential similar future features), kernel MM
introduced generic dynamaic physical address mask instead of hard-coded
__PHYSICAL_MASK in 'commit 94d49eb30e ("x86/mm: Decouple dynamic
__PHYSICAL_MASK from AMD SME")'. KVM should use that too.

Change PT64_BASE_ADDR_MASK to use kernel dynamic physical address mask
when it is enabled, instead of sme_clr. PT64_DIR_BASE_ADDR_MASK is also
deleted since it is not used at all.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:25 +01:00
Paolo Bonzini
e0dfacbfe9 KVM: nVMX: remove useless is_protmode check
VMX is only accessible in protected mode, remove a confusing check
that causes the conditional to lack a final "else" branch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:24 +01:00
Sean Christopherson
34333cc6c2 KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
Regarding segments with a limit==0xffffffff, the SDM officially states:

    When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
    or may not cause the indicated exceptions.  Behavior is
    implementation-specific and may vary from one execution to another.

In practice, all CPUs that support VMX ignore limit checks for "flat
segments", i.e. an expand-up data or code segment with base=0 and
limit=0xffffffff.  This is subtly different than wrapping the effective
address calculation based on the address size, as the flat segment
behavior also applies to accesses that would wrap the 4g boundary, e.g.
a 4-byte access starting at 0xffffffff will access linear addresses
0xffffffff, 0x0, 0x1 and 0x2.

Fixes: f9eb4af67c ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:24 +01:00
Sean Christopherson
8570f9e881 KVM: nVMX: Apply addr size mask to effective address for VMX instructions
The address size of an instruction affects the effective address, not
the virtual/linear address.  The final address may still be truncated,
e.g. to 32-bits outside of long mode, but that happens irrespective of
the address size, e.g. a 32-bit address size can yield a 64-bit virtual
address when using FS/GS with a non-zero base.

Fixes: 064aea7747 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:23 +01:00
Sean Christopherson
946c522b60 KVM: nVMX: Sign extend displacements of VMX instr's mem operands
The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
operands for various instructions, including VMX instructions, as a
naturally sized unsigned value, but masks the value by the addr size,
e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
reported as 0xffffffd8 for a 32-bit address size.  Despite some weird
wording regarding sign extension, the SDM explicitly states that bits
beyond the instructions address size are undefined:

    In all cases, bits of this field beyond the instruction’s address
    size are undefined.

Failure to sign extend the displacement results in KVM incorrectly
treating a negative displacement as a large positive displacement when
the address size of the VMX instruction is smaller than KVM's native
size, e.g. a 32-bit address size on a 64-bit KVM.

The very original decoding, added by commit 064aea7747 ("KVM: nVMX:
Decoding memory operands of VMX instructions"), sort of modeled sign
extension by truncating the final virtual/linear address for a 32-bit
address size.  I.e. it messed up the effective address but made it work
by adjusting the final address.

When segmentation checks were added, the truncation logic was kept
as-is and no sign extension logic was introduced.  In other words, it
kept calculating the wrong effective address while mostly generating
the correct virtual/linear address.  As the effective address is what's
used in the segment limit checks, this results in KVM incorreclty
injecting #GP/#SS faults due to non-existent segment violations when
a nested VMM uses negative displacements with an address size smaller
than KVM's native address size.

Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
KVM using 0x100000fd8 as the effective address when checking for a
segment limit violation.  This causes a 100% failure rate when running
a 32-bit KVM build as L1 on top of a 64-bit KVM L0.

Fixes: f9eb4af67c ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:22 +01:00
Suthikulpanit, Suravee
c57cd3c89e svm: Fix improper check when deactivate AVIC
The function svm_refresh_apicv_exec_ctrl() always returning prematurely
as kvm_vcpu_apicv_active() always return false when calling from
the function arch/x86/kvm/x86.c:kvm_vcpu_deactivate_apicv().
This is because the apicv_active is set to false just before calling
refresh_apicv_exec_ctrl().

Also, we need to mark VMCB_AVIC bit as dirty instead of VMCB_INTR.

So, fix svm_refresh_apicv_exec_ctrl() to properly deactivate AVIC.

Fixes: 67034bb9dd ('KVM: SVM: Add irqchip_split() checks before enabling AVIC')
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:22 +01:00
Paolo Bonzini
f7589cca50 KVM: x86: cull apicv code when userspace irqchip is requested
Currently apicv_active can be true even if in-kernel LAPIC
emulation is disabled.  Avoid this by properly initializing
it in kvm_arch_vcpu_init, and then do not do anything to
deactivate APICv when it is actually not used

(Currently APICv is only deactivated by SynIC code that in turn
is only reachable when in-kernel LAPIC is in use.  However, it is
cleaner if kvm_vcpu_deactivate_apicv avoids relying on this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:21 +01:00
Suthikulpanit, Suravee
98d90582be svm: Fix AVIC DFR and LDR handling
Current SVM AVIC driver makes two incorrect assumptions:
  1. APIC LDR register cannot be zero
  2. APIC DFR for all vCPUs must be the same

LDR=0 means the local APIC does not support logical destination mode.
Therefore, the driver should mark any previously assigned logical APIC ID
table entry as invalid, and return success.  Also, DFR is specific to
a particular local APIC, and can be different among all vCPUs
(as observed on Windows 10).

These incorrect assumptions cause Windows 10 and FreeBSD VMs to fail
to boot with AVIC enabled. So, instead of flush the whole logical APIC ID
table, handle DFR and LDR for each vCPU independently.

Fixes: 18f40c53e1 ('svm: Add VMEXIT handlers for AVIC')
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Julian Stecklina <jsteckli@amazon.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:20 +01:00
Gustavo A. R. Silva
90952cd388 kvm: Use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:20 +01:00
Pavel Tatashin
b5179ec418 x86/kvmclock: set offset for kvm unstable clock
VMs may show incorrect uptime and dmesg printk offsets on hypervisors with
unstable clock. The problem is produced when VM is rebooted without exiting
from qemu.

The fix is to calculate clock offset not only for stable clock but for
unstable clock as well, and use kvm_sched_clock_read() which substracts
the offset for both clocks.

This is safe, because pvclock_clocksource_read() does the right thing and
makes sure that clock always goes forward, so once offset is calculated
with unstable clock, we won't get new reads that are smaller than offset,
and thus won't get negative results.

Thank you Jon DeVree for helping to reproduce this issue.

Fixes: 857baa87b6 ("sched/clock: Enable sched clock early")
Cc: stable@vger.kernel.org
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:19 +01:00
Sean Christopherson
4f44c4eec5 KVM: VMX: Reorder clearing of registers in the vCPU-run assembly flow
Move the clearing of the common registers (not 64-bit-only) to the start
of the flow that clears registers holding guest state.  This is
purely a cosmetic change so that the label doesn't point at a blank line
and a #define.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:18 +01:00
Sean Christopherson
fc2ba5a27a KVM: VMX: Call vCPU-run asm sub-routine from C and remove clobbering
...now that the sub-routine follows standard calling conventions.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:18 +01:00
Sean Christopherson
3b895ef486 KVM: VMX: Preserve callee-save registers in vCPU-run asm sub-routine
...to make it callable from C code.

Note that because KVM chooses to be ultra paranoid about guest register
values, all callee-save registers are still cleared after VM-Exit even
though the host's values are now reloaded from the stack.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:17 +01:00
Sean Christopherson
e75c3c3a04 KVM: VMX: Return VM-Fail from vCPU-run assembly via standard ABI reg
...to prepare for making the assembly sub-routine callable from C code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:17 +01:00
Sean Christopherson
77df549559 KVM: VMX: Pass @launched to the vCPU-run asm via standard ABI regs
...to prepare for making the sub-routine callable from C code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:16 +01:00
Sean Christopherson
a62fd5a76c KVM: VMX: Use RAX as the scratch register during vCPU-run
...to prepare for making the sub-routine callable from C code.  That
means returning the result in RAX.  Since RAX will be used to return the
result, use it as the scratch register as well to make the code readable
and to document that the scratch register is more or less arbitrary.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:15 +01:00
Sean Christopherson
ee2fc635ef KVM: VMX: Rename ____vmx_vcpu_run() to __vmx_vcpu_run()
...now that the name is no longer usurped by a defunct helper function.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:15 +01:00
Sean Christopherson
c823dd5c0f KVM: VMX: Fold __vmx_vcpu_run() back into vmx_vcpu_run()
...now that the code is no longer tagged with STACK_FRAME_NON_STANDARD.
Arguably, providing __vmx_vcpu_run() to break up vmx_vcpu_run() is
valuable on its own, but the previous split was purposely made as small
as possible to limit the effects STACK_FRAME_NON_STANDARD.  In other
words, the current split is now completely arbitrary and likely not the
most logical.

This also allows renaming ____vmx_vcpu_run() to __vmx_vcpu_run() in a
future patch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:14 +01:00
Sean Christopherson
5e0781df18 KVM: VMX: Move vCPU-run code to a proper assembly routine
As evidenced by the myriad patches leading up to this moment, using
an inline asm blob for vCPU-run is nothing short of horrific.  It's also
been called "unholy", "an abomination" and likely a whole host of other
names that would violate the Code of Conduct if recorded here and now.

The code is relocated nearly verbatim, e.g. quotes, newlines, tabs and
__stringify need to be dropped, but other than those cosmetic changes
the only functional changees are to add the "call" and replace the final
"jmp" with a "ret".

Note that STACK_FRAME_NON_STANDARD is also dropped from __vmx_vcpu_run().

Suggested-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:13 +01:00
Sean Christopherson
63c73aa07f KVM: VMX: Create a stack frame in vCPU-run
...in preparation for moving to a proper assembly sub-routnine.
vCPU-run isn't a leaf function since it calls vmx_update_host_rsp()
and vmx_vmenter().  And since we need to save/restore RBP anyways,
unconditionally creating the frame costs a single MOV, i.e. don't
bother keying off CONFIG_FRAME_POINTER or using FRAME_BEGIN, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:13 +01:00
Sean Christopherson
c14f9dd50b KVM: VMX: Use #defines in place of immediates in VM-Enter inline asm
...to prepare for moving the inline asm to a proper asm sub-routine.
Eliminating the immediates allows a nearly verbatim move, e.g. quotes,
newlines, tabs and __stringify need to be dropped, but other than those
cosmetic changes the only function change will be to replace the final
"jmp" with a "ret".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:48:12 +01:00
Sean Christopherson
95c7b77d6e KVM: x86: Explicitly #define the VCPU_REGS_* indices
Declaring the VCPU_REGS_* as enums allows for more robust C code, but it
prevents using the values in assembly files.  Expliciting #define the
indices in an asm-friendly file to prepare for VMX moving its transition
code to a proper assembly file, but keep the enums for general usage.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20 22:47:38 +01:00
Davidlohr Bueso
ec95e0fa21 drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages()
The current check does not take into account the previous value of
pinned_vm; thus it is quite bogus as is. Fix this by checking the
new value after the (optimistic) atomic inc.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-20 14:42:41 -07:00
Trond Myklebust
a1231fda7e SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs
Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we
ensure memory allocations for asynchronous rpc calls don't ever end
up recursing back to the NFS layer for memory reclaim.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
e9acf2105f NFS: Fix sparse annotations for nfs_set_open_stateid_locked()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
302fad7bd5 NFS: Fix up documentation warnings
Fix up some compiler warnings about function parameters, etc not being
correctly described or formatted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
2dc23afffb NFS: ENOMEM should also be a fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
7dc58ca5d8 NFS: EINTR is also a fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:21 -05:00
Trond Myklebust
875bc3fbf2 NFS: Ensure NFS writeback allocations don't recurse back into NFS.
All the allocations that we can hit in the NFS layer and sunrpc layers
themselves are already marked as GFP_NOFS, but we need to ensure that
any calls to generic kernel functionality do the right thing as well.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
df3accb849 NFS: Pass error information to the pgio error cleanup routine
Allow the caller to pass error information when cleaning up a failed
I/O request so that we can conditionally take action to cancel the
request altogether if the error turned out to be fatal.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
078b5fd92c NFS: Clean up list moves of struct nfs_page
In several places we're just moving the struct nfs_page from one list to
another by first removing from the existing list, then adding to the new
one.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20 15:14:20 -05:00
Trond Myklebust
8127d82705 NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
If the I/O completion failed with a fatal error, then we should just
exit nfs_pageio_complete_mirror() rather than try to recoalesce.

Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Trond Myklebust
4d91969ed4 NFS: Fix an I/O request leakage in nfs_do_recoalesce
Whether we need to exit early, or just reprocess the list, we
must not lost track of the request which failed to get recoalesced.

Fixes: 03d5eb65b5 ("NFS: Fix a memory leak in nfs_do_recoalesce")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Trond Myklebust
f57dcf4c72 NFS: Fix I/O request leakages
When we fail to add the request to the I/O queue, we currently leave it
to the caller to free the failed request. However since some of the
requests that fail are actually created by nfs_pageio_add_request()
itself, and are not passed back the caller, this leads to a leakage
issue, which can again cause page locks to leak.

This commit addresses the leakage by freeing the created requests on
error, using desc->pg_completion_ops->error_cleanup()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0: c18b96a1b8: nfs: clean up rest of reqs
Cc: stable@vger.kernel.org # v4.0: d600ad1f2b: NFS41: pop some layoutget
Cc: stable@vger.kernel.org # v4.0+
2019-02-20 15:14:20 -05:00
Mike Marshall
6e356d4595 orangefs: remove two un-needed BUG_ONs...
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-02-20 15:12:52 -05:00
Jiri Olsa
b4409ae112 perf tools: Make rm_rf() remove single file
Let rm_rf() remove a file if it's provided by path, not just
directories.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190220122800.864-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20 17:09:28 -03:00
Jiri Olsa
deb83da16c perf cpumap: Increase debug level for cpu_map__snprint verbose output
So it does not screw up single -v verbose output.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190220122800.864-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-20 17:08:39 -03:00
Daniel Vetter
4509209f8b Pull in char-misc-next from Greg
We need 32ea33a044 ("mei: bus: export to_mei_cl_device for mei
client devices drivers") for the mei-hdcp patches.

References: https://lkml.org/lkml/2019/2/19/356
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2019-02-20 20:51:33 +01:00
Alexandre Belloni
65a91e2e59 clk: at91: fix masterck name
The master clock is actually named masterck earlier in the driver. Having
"mck" in the parent list means that it can never be selected.

Fixes: 1eabdc2f9d ("clk: at91: add at91sam9x5 PMCs driver")
Fixes: a2038077de ("clk: at91: add sama5d2 PMC driver")
Fixes: 084b696bb5 ("clk: at91: add sama5d4 pmc driver")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-02-20 11:40:21 -08:00