summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2012-10-29KVM: do not treat noslot pfn as a error pfnXiao Guangrong
This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29Merge remote-tracking branch 'master' into queueMarcelo Tosatti
Merge reason: development work has dependency on kvm patches merged upstream. Conflicts: arch/powerpc/include/asm/Kbuild arch/powerpc/include/asm/kvm_para.h Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-23KVM: Take kvm instead of vcpu to mmu_notifier_retryChristoffer Dall
The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22KVM: apic: fix LDR calculation in x2apic modeGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22KVM: MMU: fix release noslot pfnXiao Guangrong
We can not directly call kvm_release_pfn_clean to release the pfn since we can meet noslot pfn which is used to cache mmio info into spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-22KVM: SVM: Cleanup error statementsBorislav Petkov
Use __func__ instead of the function name in svm_hardware_enable since those things tend to get out of sync. This also slims down printk line length in conjunction with using pr_err. No functionality change. Cc: Joerg Roedel <joro@8bytes.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-18KVM: VMX: report internal error for MMIO #PF due to delivery eventXiao Guangrong
The #PF with PFEC.RSV = 1 indicates that the guest is accessing MMIO, we can not fix it if it is caused by delivery event. Reporting internal error for this case Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-18KVM: VMX: report internal error for the unhandleable eventXiao Guangrong
VM exits during Event Delivery is really unexpected if it is not caused by Exceptions/EPT-VIOLATION/TASK_SWITCH, we'd better to report an internal and freeze the guest, the VMM has the chance to check the guest Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-18KVM: do not de-cache cr4 bits needlesslyGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: introduce FNAME(prefetch_gpte)Xiao Guangrong
The only difference between FNAME(update_pte) and FNAME(pte_prefetch) is that the former is allowed to prefetch gfn from dirty logged slot, so introduce a common function to prefetch spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: move prefetch_invalid_gpte out of pagaing_tmp.hXiao Guangrong
The function does not depend on guest mmu mode, move it out from paging_tmpl.h Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: cleanup FNAME(page_fault)Xiao Guangrong
Let it return emulate state instead of spte like __direct_map Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-17KVM: MMU: remove mmu_is_invalidXiao Guangrong
Remove mmu_is_invalid and use is_invalid_pfn instead Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-08KVM: x86: Make emulator_fix_hypercall staticJan Kiszka
No users outside of kvm/x86.c. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-08KVM: x86: Convert kvm_arch_vcpu_reset into private kvm_vcpu_resetJan Kiszka
There are no external callers of this function as there is no concept of resetting a vcpu from generic code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-04Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c
2012-10-01Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/fpu update from Ingo Molnar: "The biggest change is the addition of the non-lazy (eager) FPU saving support model and enabling it on CPUs with optimized xsaveopt/xrstor FPU state saving instructions. There are also various Sparse fixes" Fix up trivial add-add conflict in arch/x86/kernel/traps.c * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: fix kvm's usage of kernel_fpu_begin/end() x86, fpu: remove cpu_has_xmm check in the fx_finit() x86, fpu: make eagerfpu= boot param tri-state x86, fpu: enable eagerfpu by default for xsaveopt x86, fpu: decouple non-lazy/eager fpu restore from xsave x86, fpu: use non-lazy fpu restore for processors supporting xsave lguest, x86: handle guest TS bit for lazy/non-lazy fpu host models x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu() x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig() x86, fpu: drop_fpu() before restoring new state from sigframe x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels x86, fpu: Consolidate inline asm routines for saving/restoring fpu state x86, signal: Cleanup ifdefs and is_ia32, is_x32
2012-10-01Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/cleanups from Ingo Molnar: "Smaller cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86: Remove unecessary semicolons x86, boot: Remove obsolete and unused constant RAMDISK
2012-10-01Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf update from Ingo Molnar: "Lots of changes in this cycle as well, with hundreds of commits from over 30 contributors. Most of the activity was on the tooling side. Higher level changes: - New 'perf kvm' analysis tool, from Xiao Guangrong. - New 'perf trace' system-wide tracing tool - uprobes fixes + cleanups from Oleg Nesterov. - Lots of patches to make perf build on Android out of box, from Irina Tirdea - Extend ftrace function tracing utility to be more dynamic for its users. It allows for data passing to the callback functions, as well as reading regs as if a breakpoint were to trigger at function entry. The main goal of this patch series was to allow kprobes to use ftrace as an optimized probe point when a probe is placed on an ftrace nop. With lots of help from Masami Hiramatsu, and going through lots of iterations, we finally came up with a good solution. - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng. - Various tracing updates from Steve Rostedt - Clean up and improve 'perf sched' performance by elliminating lots of needless calls to libtraceevent. - Event group parsing support, from Jiri Olsa - UI/gtk refactorings and improvements from Namhyung Kim - Add support for non-tracepoint events in perf script python, from Feng Tang - Add --symbols to 'script', similar to the one in 'report', from Feng Tang. Infrastructure enhancements and fixes: - Convert the trace builtins to use the growing evsel/evlist tracepoint infrastructure, removing several open coded constructs like switch like series of strcmp to dispatch events, etc. Basically what had already been showcased in 'perf sched'. - Add evsel constructor for tracepoints, that uses libtraceevent just to parse the /format events file, use it in a new 'perf test' to make sure the libtraceevent format parsing regressions can be more readily caught. - Some strange errors were happening in some builds, but not on the next, reported by several people, problem was some parser related files, generated during the build, didn't had proper make deps, fix from Eric Sandeen. - Introduce struct and cache information about the environment where a perf.data file was captured, from Namhyung Kim. - Fix handling of unresolved samples when --symbols is used in 'report', from Feng Tang. - Add union member access support to 'probe', from Hyeoncheol Lee. - Fixups to die() removal, from Namhyung Kim. - Render fixes for the TUI, from Namhyung Kim. - Don't enable annotation in non symbolic view, from Namhyung Kim. - Fix pipe mode in 'report', from Namhyung Kim. - Move related stats code from stat to util/, will be used by the 'stat' kvm tool, from Xiao Guangrong. - Remove die()/exit() calls from several tools. - Resolve vdso callchains, from Jiri Olsa - Don't pass const char pointers to basename, so that we can unconditionally use libgen.h and thus avoid ifdef BIONIC lines, from David Ahern - Refactor hist formatting so that it can be reused with the GTK browser, From Namhyung Kim - Fix build for another rbtree.c change, from Adrian Hunter. - Make 'perf diff' command work with evsel hists, from Jiri Olsa. - Use the only field_sep var that is set up: symbol_conf.field_sep, fix from Jiri Olsa. - .gitignore compiled python binaries, from Namhyung Kim. - Get rid of die() in more libtraceevent places, from Namhyung Kim. - Rename libtraceevent 'private' struct member to 'priv' so that it works in C++, from Steven Rostedt - Remove lots of exit()/die() calls from tools so that the main perf exit routine can take place, from David Ahern - Fix x86 build on x86-64, from David Ahern. - {int,str,rb}list fixes from Suzuki K Poulose - perf.data header fixes from Namhyung Kim - Allow user to indicate objdump path, needed in cross environments, from Maciek Borzecki - Fix hardware cache event name generation, fix from Jiri Olsa - Add round trip test for sw, hw and cache event names, catching the problem Jiri fixed, after Jiri's patch, the test passes successfully. - Clean target should do clean for lib/traceevent too, fix from David Ahern - Check the right variable for allocation failure, fix from Namhyung Kim - Set up evsel->tp_format regardless of evsel->name being set already, fix from Namhyung Kim - Oprofile fixes from Robert Richter. - Remove perf_event_attr needless version inflation, from Jiri Olsa - Introduce libtraceevent strerror like error reporting facility, from Namhyung Kim - Add pmu mappings to perf.data header and use event names from cmd line, from Robert Richter - Fix include order for bison/flex-generated C files, from Ben Hutchings - Build fixes and documentation corrections from David Ahern - Assorted cleanups from Robert Richter - Let O= makes handle relative paths, from Steven Rostedt - perf script python fixes, from Feng Tang. - Initial bash completion support, from Frederic Weisbecker - Allow building without libelf, from Namhyung Kim. - Support DWARF CFI based unwind to have callchains when %bp based unwinding is not possible, from Jiri Olsa. - Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF section was the end goal, several fixes for code that handles all architectures and cleanups are included, from Cody Schafer. - Assorted fixes for Documentation and build in 32 bit, from Robert Richter - Cache the libtraceevent event_format associated to each evsel early, so that we avoid relookups, i.e. calling pevent_find_event repeatedly when processing tracepoint events. [ This is to reduce the surface contact with libtraceevents and make clear what is that the perf tools needs from that lib: so far parsing the common and per event fields. ] - Don't stop the build if the audit libraries are not installed, fix from Namhyung Kim. - Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf. - Improve warning message when libunwind devel packages not present, from Jiri Olsa" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf trace: Add aliases for some syscalls perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables perf tools: Check libaudit availability for perf-trace builtin perf hists: Add missing period_* fields when collapsing a hist entry perf trace: New tool perf evsel: Export the event_format constructor perf evsel: Introduce rawptr() method perf tools: Use perf_evsel__newtp in the event parser perf evsel: The tracepoint constructor should store sys:name perf evlist: Introduce set_filter() method perf evlist: Renane set_filters method to apply_filters perf test: Add test to check we correctly parse and match syscall open parms perf evsel: Handle endianity in intval method perf evsel: Know if byte swap is needed perf tools: Allow handling a NULL cpu_map as meaning "all cpus" perf evsel: Improve tracepoint constructor setup tools lib traceevent: Fix error path on pevent_parse_event perf test: Fix build failure trace: Move trace event enable from fs_initcall to core_initcall tracing: Add an option for disabling markers ...
2012-09-23KVM: x86: Fix guest debug across vcpu INIT resetJan Kiszka
If we reset a vcpu on INIT, we so far overwrote dr7 as provided by KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally. Fix this by saving the dr7 used for guest debugging and calculating the effective register value as well as switch_db_regs on any potential change. This will change to focus of the set_guest_debug vendor op to update_dp_bp_intercept. Found while trying to stop on start_secondary. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-23KVM: Add resampling irqfds for level triggered interruptsAlex Williamson
To emulate level triggered interrupts, add a resample option to KVM_IRQFD. When specified, a new resamplefd is provided that notifies the user when the irqchip has been resampled by the VM. This may, for instance, indicate an EOI. Also in this mode, posting of an interrupt through an irqfd only asserts the interrupt. On resampling, the interrupt is automatically de-asserted prior to user notification. This enables level triggered interrupts to be posted and re-enabled from vfio with no userspace intervention. All resampling irqfds can make use of a single irq source ID, so we reserve a new one for this interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-21x86, kvm: fix kvm's usage of kernel_fpu_begin/end()Suresh Siddha
Preemption is disabled between kernel_fpu_begin/end() and as such it is not a good idea to use these routines in kvm_load/put_guest_fpu() which can be very far apart. kvm_load/put_guest_fpu() routines are already called with preemption disabled and KVM already uses the preempt notifier to save the guest fpu state using kvm_put_guest_fpu(). So introduce __kernel_fpu_begin/end() routines which don't touch preemption and use them instead of kernel_fpu_begin/end() for KVM's use model of saving/restoring guest FPU state. Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit state in the case of VMX. For eagerFPU case, host cr0.TS is always clear. So no need to worry about it. For the traditional lazyFPU restore case, change the cr0.TS bit for the host state during vm-exit to be always clear and cr0.TS bit is set in the __vmx_load_host_state() when the FPU (guest FPU or the host task's FPU) state is not active. This ensures that the host/guest FPU state is properly saved, restored during context-switch and with interrupts (using irq_fpu_usable()) not stomping on the active FPU state. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-21KVM: x86: Export svm/vmx exit code and vector code to userspaceXiao Guangrong
Exporting KVM exit information to userspace to be consumed by perf. Signed-off-by: Dong Hao <haodong@linux.vnet.ibm.com> [ Dong Hao <haodong@linux.vnet.ibm.com>: rebase it on acme's git tree ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1347870675-31495-2-git-send-email-haodong@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-20KVM: optimize apic interrupt deliveryGleb Natapov
Most interrupt are delivered to only one vcpu. Use pre-build tables to find interrupt destination instead of looping through all vcpus. In case of logical mode loop only through vcpus in a logical cluster irq is sent to. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Eliminate pointless temporary 'ac'Avi Kivity
'ac' essentially reconstructs the 'access' variable we already have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK. As these are not used by callees, just use 'access' directly. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Avoid access/dirty update loop if all is wellAvi Kivity
Keep track of accessed/dirty bits; if they are all set, do not enter the accessed/dirty update loop. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Eliminate eperm temporaryAvi Kivity
'eperm' is no longer used in the walker loop, so we can eliminate it. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Optimize is_last_gpte()Avi Kivity
Instead of branchy code depending on level, gpte.ps, and mmu configuration, prepare everything in a bitmap during mode changes and look it up during runtime. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Simplify walk_addr_generic() loopAvi Kivity
The page table walk is coded as an infinite loop, with a special case on the last pte. Code it as an ordinary loop with a termination condition on the last pte (large page or walk length exhausted), and put the last pte handling code after the loop where it belongs. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Optimize pte permission checksAvi Kivity
walk_addr_generic() permission checks are a maze of branchy code, which is performed four times per lookup. It depends on the type of access, efer.nxe, cr0.wp, cr4.smep, and in the near future, cr4.smap. Optimize this away by precalculating all variants and storing them in a bitmap. The bitmap is recalculated when rarely-changing variables change (cr0, cr4) and is indexed by the often-changing variables (page fault error code, pte access permissions). The permission check is moved to the end of the loop, otherwise an SMEP fault could be reported as a false positive, when PDE.U=1 but PTE.U=0. Noted by Xiao Guangrong. The result is short, branch-free code. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Update accessed and dirty bits after guest pagetable walkAvi Kivity
While unspecified, the behaviour of Intel processors is to first perform the page table walk, then, if the walk was successful, to atomically update the accessed and dirty bits of walked paging elements. While we are not required to follow this exactly, doing so will allow us to perform the access permissions check after the walk is complete, rather than after each walk step. (the tricky case is SMEP: a zero in any pte's U bit makes the referenced page a supervisor page, so we can't fault on a one bit during the walk itself). Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Move gpte_access() out of paging_tmpl.hAvi Kivity
We no longer rely on paging_tmpl.h defines; so we can move the function to mmu.c. Rely on zero extension to 64 bits to get the correct nx behaviour. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Optimize gpte_access() slightlyAvi Kivity
If nx is disabled, then is gpte[63] is set we will hit a reserved bit set fault before checking permissions; so we can ignore the setting of efer.nxe. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: MMU: Push clean gpte write protection out of gpte_access()Avi Kivity
gpte_access() computes the access permissions of a guest pte and also write-protects clean gptes. This is wrong when we are servicing a write fault (since we'll be setting the dirty bit momentarily) but correct when instantiating a speculative spte, or when servicing a read fault (since we'll want to trap a following write in order to set the dirty bit). It doesn't seem to hurt in practice, but in order to make the code readable, push the write protection out of gpte_access() and into a new protect_clean_gpte() which is called explicitly when needed. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-19arch/x86: Remove unecessary semicolonsPeter Senna Tschudin
Found by http://coccinelle.lip6.fr/ Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Cc: avi@redhat.com Cc: mtosatti@redhat.com Cc: a.p.zijlstra@chello.nl Cc: rusty@rustcorp.com.au Cc: masami.hiramatsu.pt@hitachi.com Cc: suresh.b.siddha@intel.com Cc: joerg.roedel@amd.com Cc: agordeev@redhat.com Cc: yinghai@kernel.org Cc: bhelgaas@google.com Cc: liuj97@gmail.com Link: http://lkml.kernel.org/r/1347986174-30287-7-git-send-email-peter.senna@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-18x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()Suresh Siddha
kvm's guest fpu save/restore should be wrapped around kernel_fpu_begin/end(). This will avoid for example taking a DNA in kvm_load_guest_fpu() when it tries to load the fpu immediately after doing unlazy_fpu() on the host side. More importantly this will prevent the host process fpu from being corrupted. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-4-git-send-email-suresh.b.siddha@intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-17KVM: make processes waiting on vcpu mutex killableMichael S. Tsirkin
vcpu mutex can be held for unlimited time so taking it with mutex_lock on an ioctl is wrong: one process could be passed a vcpu fd and call this ioctl on the vcpu used by another process, it will then be unkillable until the owner exits. Call mutex_lock_killable instead and return status. Note: mutex_lock_interruptible would be even nicer, but I am not sure all users are prepared to handle EINTR from these ioctls. They might misinterpret it as an error. Cleanup paths expect a vcpu that can't be used by any userspace so this will always succeed - catch bugs by calling BUG_ON. Catch callers that don't check return state by adding __must_check. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-17KVM: SVM: Make use of asm.hAvi Kivity
Use macros for bitness-insensitive register names, instead of rolling our own. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-17KVM: VMX: Make use of asm.hAvi Kivity
Use macros for bitness-insensitive register names, instead of rolling our own. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-17KVM: VMX: Make lto-friendlyAvi Kivity
LTO (link-time optimization) doesn't like local labels to be referred to from a different function, since the two functions may be built in separate compilation units. Use an external variable instead. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-12KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()Takuya Yoshikawa
find_highest_vector() and count_vectors(): - Instead of using magic values, define and use proper macros. find_highest_vector(): - Remove likely() which is there only for historical reasons and not doing correct branch predictions anymore. Using such heuristics to optimize this function is not worth it now. Let CPUs predict things instead. - Stop checking word[0] separately. This was only needed for doing likely() optimization. - Use for loop, not while, to iterate over the register array to make the code clearer. Note that we actually confirmed that the likely() did wrong predictions by inserting debug code. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-10KVM: fix error paths for failed gfn_to_page() callsXiao Guangrong
This bug was triggered: [ 4220.198458] BUG: unable to handle kernel paging request at fffffffffffffffe [ 4220.203907] IP: [<ffffffff81104d85>] put_page+0xf/0x34 ...... [ 4220.237326] Call Trace: [ 4220.237361] [<ffffffffa03830d0>] kvm_arch_destroy_vm+0xf9/0x101 [kvm] [ 4220.237382] [<ffffffffa036fe53>] kvm_put_kvm+0xcc/0x127 [kvm] [ 4220.237401] [<ffffffffa03702bc>] kvm_vcpu_release+0x18/0x1c [kvm] [ 4220.237407] [<ffffffff81145425>] __fput+0x111/0x1ed [ 4220.237411] [<ffffffff8114550f>] ____fput+0xe/0x10 [ 4220.237418] [<ffffffff81063511>] task_work_run+0x5d/0x88 [ 4220.237424] [<ffffffff8104c3f7>] do_exit+0x2bf/0x7ca The test case: printf(fmt, ##args); \ exit(-1);} while (0) static int create_vm(void) { int sys_fd, vm_fd; sys_fd = open("/dev/kvm", O_RDWR); if (sys_fd < 0) die("open /dev/kvm fail.\n"); vm_fd = ioctl(sys_fd, KVM_CREATE_VM, 0); if (vm_fd < 0) die("KVM_CREATE_VM fail.\n"); return vm_fd; } static int create_vcpu(int vm_fd) { int vcpu_fd; vcpu_fd = ioctl(vm_fd, KVM_CREATE_VCPU, 0); if (vcpu_fd < 0) die("KVM_CREATE_VCPU ioctl.\n"); printf("Create vcpu.\n"); return vcpu_fd; } static void *vcpu_thread(void *arg) { int vm_fd = (int)(long)arg; create_vcpu(vm_fd); return NULL; } int main(int argc, char *argv[]) { pthread_t thread; int vm_fd; (void)argc; (void)argv; vm_fd = create_vm(); pthread_create(&thread, NULL, vcpu_thread, (void *)(long)vm_fd); printf("Exit.\n"); return 0; } It caused by release kvm->arch.ept_identity_map_addr which is the error page. The parent thread can send KILL signal to the vcpu thread when it was exiting which stops faulting pages and potentially allocating memory. So gfn_to_pfn/gfn_to_page may fail at this time Fixed by checking the page before it is used Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-10KVM: MMU: remove unnecessary checkXiao Guangrong
Checking the return of kvm_mmu_get_page is unnecessary since it is guaranteed by memory cache Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-10KVM: Depend on HIGH_RES_TIMERSLiu, Jinsong
KVM lapic timer and tsc deadline timer based on hrtimer, setting a leftmost node to rb tree and then do hrtimer reprogram. If hrtimer not configured as high resolution, hrtimer_enqueue_reprogram do nothing and then make kvm lapic timer and tsc deadline timer fail. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-09KVM: x86: Check INVPCID feature bit in EBX of leaf 7Ren, Yongjie
Checks and operations on the INVPCID feature bit should use EBX of CPUID leaf 7 instead of ECX. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Yongjie Ren <yongjien.ren@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: use symbolic constant for nr interruptsMichael S. Tsirkin
interrupt_bitmap is KVM_NR_INTERRUPTS bits in size, so just use that instead of hard-coded constants and math. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: emulator: optimize "rep ins" handlingGleb Natapov
Optimize "rep ins" by allowing emulator to write back more than one datum at a time. Introduce new operand type OP_MEM_STR which tells writeback() that dst contains pointer to an array that should be written back as opposite to just one data element. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: emulator: string_addr_inc() cleanupGleb Natapov
Remove unneeded segment argument. Address structure already has correct segment which was put there during decode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: emulator: make x86 emulation modes enum instead of definesGleb Natapov
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: Provide userspace IO exit completion callbackGleb Natapov
Current code assumes that IO exit was due to instruction emulation and handles execution back to emulator directly. This patch adds new userspace IO exit completion callback that can be set by any other code that caused IO exit to userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>