summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2010-08-01KVM: VMX: VMCLEAR/VMPTRLD usage changesDongxiao Xu
Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: VMX: Some minor changes to code structureDongxiao Xu
Do some preparations for vmm coexistence support. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: VMX: Define new functions to wrapper direct call of asm codeDongxiao Xu
Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm code. Also move VMXE bit operation out of kvm_cpu_vmxoff(). Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()Avi Kivity
We drop the mmu lock between freeing memory and allocating the roots; this allows some other vcpu to sneak in and allocate memory. While the race is benign (resulting only in temporary overallocation, not oom) it is simple and easy to fix by moving the freeing close to the allocation. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: inject #UD if instruction emulation fails and exit to userspaceGleb Natapov
Do not kill VM when instruction emulation fails. Inject #UD and report failure to userspace instead. Userspace may choose to reenter guest if vcpu is in userspace (cpl == 3) in which case guest OS will kill offending process and continue running. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freedGui Jianfeng
Currently, kvm_mmu_zap_page() returning the number of freed children sp. This might confuse the caller, because caller don't know the actual freed number. Let's make kvm_mmu_zap_page() return the number of pages it actually freed. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: Fix debug output error in walk_addr()Gui Jianfeng
Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: mark page table dirty when a pte is actually modifiedGui Jianfeng
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: SVM: Allow EFER.LMSLE to be set with nested svmJoerg Roedel
This patch enables setting of efer bit 13 which is allowed in all SVM capable processors. This is necessary for the SLES11 version of Xen 4.0 to boot with nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: SVM: Dump vmcb contents on failed vmrunJoerg Roedel
This patch adds a function to dump the vmcb into the kernel log and calls it after a failed vmrun to ease debugging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Get rid of KVM_REQ_KICKAvi Kivity
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu->requests to fail all the time, triggering tons of atomic operations. Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: do not inject exception directly into vcpuGleb Natapov
Return exception as a result of instruction emulation and handle injection in KVM code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: move interruptibility state tracking out of emulatorGleb Natapov
Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: handle shadowed registers outside emulatorGleb Natapov
Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: use shadowed register in emulate_sysexit()Gleb Natapov
emulate_sysexit() should use shadowed registers copy instead of looking into vcpu state directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: set RFLAGS outside x86 emulator codeGleb Natapov
Removes the need for set_flags() callback. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: advance RIP outside x86 emulator codeGleb Natapov
Return new RIP as part of instruction emulation result instead of updating KVM's RIP from x86 emulator code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: handle emulation failure case firstGleb Natapov
If emulation failed return immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: do not inject #PF in (read|write)_emulated() callbacksGleb Natapov
Return error to x86 emulator instead of injection exception behind its back. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: remove export of emulator_write_emulated()Gleb Natapov
It is not called directly outside of the file it's defined in anymore. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation ↵Gleb Natapov
failure Currently emulator returns -1 when emulation failed or IO is needed. Caller tries to guess whether emulation failed by looking at other variables. Make it easier for caller to recognise error condition by always returning -1 in case of failure. For this new emulator internal return value X86EMUL_IO_NEEDED is introduced. It is used to distinguish between error condition (which returns X86EMUL_UNHANDLEABLE) and condition that requires IO exit to userspace to continue emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: fill in run->mmio details in (read|write)_emulated functionGleb Natapov
Fill in run->mmio details in (read|write)_emulated function just like pio does. There is no point in filling only vcpu fields there just to copy them into vcpu->run a little bit later. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: make (get|set)_dr() callback return error if it failsGleb Natapov
Make (get|set)_dr() callback return error if it fails instead of injecting exception behind emulator's back. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: make set_cr() callback return error if it failsGleb Natapov
Make set_cr() callback return error if it fails instead of injecting #GP behind emulator's back. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacksGleb Natapov
Use callbacks from x86_emulate_ops to access segments instead of calling into kvm directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_opsGleb Natapov
On VMX it is expensive to call get_cached_descriptor() just to get segment base since multiple vmcs_reads are done instead of only one. Introduce new call back get_cached_segment_base() for efficiency. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_opsGleb Natapov
Add (set|get)_msr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_opsGleb Natapov
Add (set|get)_dr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: handle "far address" source operandGleb Natapov
ljmp/lcall instruction operand contains address and segment. It can be 10 bytes long. Currently we decode it as two different operands. Fix it by introducing new kind of operand that can hold entire far address. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: cleanup nop emulationGleb Natapov
Make it more explicit what we are checking for. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: cleanup xchg emulationGleb Natapov
Dst operand is already initialized during decoding stage. No need to reinitialize. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: fix Move r/m16 to segment register decodingGleb Natapov
This instruction does not need generic decoding for its dst operand. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86 emulator: introduce read cacheGleb Natapov
Introduce read cache which is needed for instruction that require more then one exit to userspace. After returning from userspace the instruction will be re-executed with cached read value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: VMX: Avoid writing HOST_CR0 every entryAvi Kivity
cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally (which is a safe value), and issue a clts() just before exiting vcpu context if the task indeed owns the fpu. Saves ~50 cycles/exit. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: kvm_pdptr_read() may sleepAvi Kivity
Annotate it thusly. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: x86: avoid unnecessary bitmap allocation when memslot is cleanTakuya Yoshikawa
Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Furthermore, reducing memory allocations and copies will produce good effects to caches too. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: VMX: Simplify vmx_get_nmi_mask()Avi Kivity
!! is not needed due to the cast to bool. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Avoid killing userspace through guest SRAO MCE on unmapped pagesHuang Ying
In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-28x86, cpu: Use AMD errata checking framework for erratum 383Hans Rosenfeld
Use the AMD errata checking framework instead of open-coding the test. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-3-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-27Merge remote branch 'origin/x86/urgent' into x86/asmH. Peter Anvin
2010-07-23KVM: Use kmalloc() instead of vmalloc() for KVM_[GS]ET_MSRAvi Kivity
We don't need more than a page, and vmalloc() is slower (much slower recently due to a regression). Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23KVM: MMU: fix conflict access permissions in direct spXiao Guangrong
In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> |---| P[W] | | LPA \ PDE2 -> |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-21x86: Remove redundant K6 MSRsBrian Gerst
MSR_K6_EFER is unused, and MSR_K6_STAR is redundant with MSR_STAR. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1279371808-24804-1-git-send-email-brgerst@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-19mm: add context argument to shrinker callbackDave Chinner
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-12KVM: MMU: flush remote tlbs when overwriting spte with different pfnXiao Guangrong
After remove a rmap, we should flush all vcpu's tlb Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-07-06KVM: VMX: Fix host MSR_KERNEL_GS_BASE corruptionAvi Kivity
enter_lmode() and exit_lmode() modify the guest's EFER.LMA before calling vmx_set_efer(). However, the latter function depends on the value of EFER.LMA to determine whether MSR_KERNEL_GS_BASE needs reloading, via vmx_load_host_state(). With EFER.LMA changing under its feet, it took the wrong choice and corrupted userspace's %gs. This causes 32-on-64 host userspace to fault. Fix not touching EFER.LMA; instead ask vmx_set_efer() to change it. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 pageAvi Kivity
If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0. We do that by setting spte.w=1, since the host cr0.wp must remain set so the host can write protect pages. Once we allow write access, we must remove user access otherwise we mistakenly allow the user to write the page. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09KVM: MMU: invalidate and flush on spte small->large page size changeMarcelo Tosatti
Always invalidate spte and flush TLBs when changing page size, to make sure different sized translations for the same address are never cached in a CPU's TLB. Currently the only case where this occurs is when a non-leaf spte pointer is overwritten by a leaf, large spte entry. This can happen after dirty logging is disabled on a memslot, for example. Noticed by Andrea. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09KVM: SVM: Implement workaround for Erratum 383Joerg Roedel
This patch implements a workaround for AMD erratum 383 into KVM. Without this erratum fix it is possible for a guest to kill the host machine. This patch implements the suggested workaround for hypervisors which will be published by the next revision guide update. [jan: fix overflow warning on i386] [xiao: fix unused variable warning] Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-06-09KVM: SVM: Handle MCEs early in the vmexit processJoerg Roedel
This patch moves handling of the MC vmexits to an earlier point in the vmexit. The handle_exit function is too late because the vcpu might alreadry have changed its physical cpu. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>