summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2012-07-16apic: add apic_set_eoi_write for PV useMichael S. Tsirkin
KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-14vsyscall_64: add missing ifdef CONFIG_SECCOMPWill Drewry
vsyscall_seccomp introduced a dependency on __secure_computing. On configurations with CONFIG_SECCOMP disabled, compilation will fail. Reported-by: feng xiangjun <fengxj325@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13x86/vsyscall: allow seccomp filter in vsyscall=emulateWill Drewry
If a seccomp filter program is installed, older static binaries and distributions with older libc implementations (glibc 2.13 and earlier) that rely on vsyscall use will be terminated regardless of the filter program policy when executing time, gettimeofday, or getcpu. This is only the case when vsyscall emulation is in use (vsyscall=emulate is the default). This patch emulates system call entry inside a vsyscall=emulate by populating regs->ax and regs->orig_ax with the system call number prior to calling into seccomp such that all seccomp-dependencies function normally. Additionally, system call return behavior is emulated in line with other vsyscall entrypoints for the trace/trap cases. [ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ] Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-12olpc-xo15-sci: Use struct dev_pm_ops for power managementRafael J. Wysocki
Make the OLPC XO15 SCI driver define its resume callback through a struct dev_pm_ops object rather than by using a legacy PM hook in struct acpi_device_ops. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2012-07-12KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11Merge branch 'mce-ripvfix' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce Merge memory fault handling fix from Tony Luck. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-11x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faultsTony Luck
In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine check recovery if it is safe") we fixed mce_notify_process() to force a signal to the current process if it was not restartable (RIPV bit not set in MCG_STATUS). But doing it here means that the process doesn't get told the virtual address of the fault via siginfo_t->si_addr. This would prevent application level recovery from the fault. Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so that we will provide the right information with the signal. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org # 3.4+
2012-07-11KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava
While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fix kvm_mmu_pagetable_walk tracepointXiao Guangrong
The P bit of page fault error code is missed in this tracepoint, fix it by passing the full error code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: trace fast page faultXiao Guangrong
To see what happen on this path and help us to optimize it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fast path of handling guest page faultXiao Guangrong
If the the present bit of page fault error code is set, it indicates the shadow page is populated on all levels, it means what we do is only modify the access bit which can be done out of mmu-lock Currently, in order to simplify the code, we only fix the page fault caused by write-protect on the fast path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: introduce SPTE_MMU_WRITEABLE bitXiao Guangrong
This bit indicates whether the spte can be writable on MMU, that means the corresponding gpte is writable and the corresponding gfn is not protected by shadow page protection In the later path, SPTE_MMU_WRITEABLE will indicates whether the spte can be locklessly updated Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: fold tlb flush judgement into mmu_spte_updateXiao Guangrong
mmu_spte_update() is the common function, we can easily audit the path Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: VMX: export PFEC.P bit on eptXiao Guangrong
Export the present bit of page fault error code, the later patch will use it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: cleanup spte_write_protectXiao Guangrong
Use __drop_large_spte to cleanup this function and comment spte_write_protect Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: abstract spte write-protectXiao Guangrong
Introduce a common function to abstract spte write-protect to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: MMU: return bool in __rmap_write_protectXiao Guangrong
The reture value of __rmap_write_protect is either 1 or 0, use true/false instead of these Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11Merge tag 'v3.5-rc6' into x86/mceIngo Molnar
Merge Linux 3.5-rc6 before merging more code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-11crypto: twofish-avx - remove useless instructionJohannes Goetzfried
The register %rdx is written, but never read till the end of the encryption routine. Therefore let's delete the useless instruction. Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-11crypto: aesni-intel - fix wrong kfree pointerMilan Broz
kfree(new_key_mem) in rfc4106_set_key() should be called on malloced pointer, not on aligned one, otherwise it can cause invalid pointer on free. (Seen at least once when running tcrypt tests with debug kernel.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-07-10x86/mm/mtrr: Slightly simplify print_mtrr_state()Jan Beulich
high_width can be easily calculated in a single expression when making use of __ffs64(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4FF71053020000780008E1B5@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-10x86/mm/mtrr: Fix alignment determination in range_to_mtrr()Jan Beulich
With the variable operated on being of "unsigned long" type, neither ffs() nor fls() are suitable to use on them, as those truncate their arguments to 32 bits. Using __ffs() and __fls() respectively at once eliminates the need to subtract 1 from their results. Additionally, with the alignment value subsequently used as a shift count, it must be enforced to be less than BITS_PER_LONG (and on 64-bit there's no need for it to be any smaller). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4FF70D54020000780008E179@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-09x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimentalJean Delvare
This feature has been around for over 5 years now, and has no CONFIG_EXPERIMENTAL dependency anymore, so remove the '(EXPERIMENTAL)' tag from the help text as well. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/1341583705.4655.18.camel@amber.site Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-09KVM: VMX: Emulate invalid guest state by defaultAvi Kivity
Our emulation should be complete enough that we can emulate guests while they are in big real mode, or in a mode transition that is not virtualizable without unrestricted guest support. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: implement LTRAvi Kivity
Opcode 0F 00 /3. Encountered during Windows XP secondary processor bringup. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: make loading TR set the busy bitAvi Kivity
Guest software doesn't actually depend on it, but vmx will refuse us entry if we don't. Set the bit in both the cached segment and memory, just to be nice. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: make read_segment_descriptor() return the addressAvi Kivity
Some operations want to modify the descriptor later on, so save the address for future use. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate LLDTAvi Kivity
Opcode 0F 00 /2. Used by isolinux durign the protected mode transition. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate BSWAPAvi Kivity
Opcodes 0F C8 - 0F CF. Used by the SeaBIOS cdrom code (though not in big real mode). Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Improve error reporting during invalid guest state emulationAvi Kivity
If instruction emulation fails, report it properly to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Stop invalid guest state emulation on pending eventAvi Kivity
Process the event, possibly injecting an interrupt, before continuing. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: implement ENTERAvi Kivity
Opcode C8. Only ENTER with lexical nesting depth 0 is implemented, since others are very rare. We'll fail emulation if nonzero lexical depth is used so data is not corrupted. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: split push logic from push opcode emulationAvi Kivity
This allows us to reuse the code without populating ctxt->src and overriding ctxt->op_bytes. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: fix byte-sized MOVZX/MOVSXAvi Kivity
Commit 2adb5ad9fe1 removed ByteOp from MOVZX/MOVSX, replacing them by SrcMem8, but neglected to fix the dependency in the emulation code on ByteOp. This caused the instruction not to have any effect in some circumstances. Fix by replacing the check for ByteOp with the equivalent src.op_bytes == 1. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate LAHFAvi Kivity
Opcode 9F. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Continue emulating after batch exhaustedAvi Kivity
If we return early from an invalid guest state emulation loop, make sure we return to it later if the guest state is still invalid. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Fix interrupt exit condition during emulationAvi Kivity
Checking EFLAGS.IF is incorrect as we might be in interrupt shadow. If that is the case, the main loop will notice that and not inject the interrupt, causing an endless loop. Fix by using vmx_interrupt_allowed() to check if we can inject an interrupt instead. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate SGDT/SIDTAvi Kivity
Opcodes 0F 01 /0 and 0F 01 /1 Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: Fix SS default ESP/EBP based addressingAvi Kivity
We correctly default to SS when BP is used as a base in 16-bit address mode, but we don't do that for 32-bit mode. Fix by adjusting the default to SS when either ESP or EBP is used as the base register. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: initialize memopAvi Kivity
memop is not initialized; this can lead to a two-byte operation following a 4-byte operation to see garbage values. Usually truncation fixes things fot us later on, but at least in one case (call abs) it doesn't. Fix by moving memop to the auto-initialized field area. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate LEAVEAvi Kivity
Opcode c9; used by some variants of Windows during boot, in big real mode. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Limit iterations with emulator_invalid_guest_stateAvi Kivity
Otherwise, if the guest ends up looping, we never exit the srcu critical section, which causes synchronize_srcu() to hang. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Relax check on unusable segmentAvi Kivity
Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment descriptors, causing us not to recognize them as unusable segments with emulate_invalid_guest_state=1. Relax the check by testing for segment not present (a non-present segment cannot be usable). Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: fix LIDT/LGDT in long modeAvi Kivity
The operand size for these instructions is 8 bytes in long mode, even without a REX prefix. Set it explicitly. Triggered while booting Linux with emulate_invalid_guest_state=1. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: allow loading null SS in long modeAvi Kivity
Null SS is valid in long mode; allow loading it. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: emulate cpuidAvi Kivity
Opcode 0F A2. Used by Linux during the mode change trampoline while in a state that is not virtualizable on vmx without unrestricted_guest, so we need to emulate it is emulate_invalid_guest_state=1. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semanticsAvi Kivity
Instead of getting an exact leaf, follow the spec and fall back to the last main leaf instead. This lets us easily emulate the cpuid instruction in the emulator. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: Split cpuid register access from computationAvi Kivity
Introduce kvm_cpuid() to perform the leaf limit check and calculate register values, and let kvm_emulate_cpuid() just handle reading and writing the registers from/to the vcpu. This allows us to reuse kvm_cpuid() in a context where directly reading and writing registers is not desired. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: VMX: Return correct CPL during transition to protected modeAvi Kivity
In protected mode, the CPL is defined as the lower two bits of CS, as set by the last far jump. But during the transition to protected mode, there is no last far jump, so we need to return zero (the inherited real mode CPL). Fix by reading CPL from the cache during the transition. This isn't 100% correct since we don't set the CPL cache on a far jump, but since protected mode transition will always jump to a segment with RPL=0, it will always work. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulationAvi Kivity
Currently the MMU's ->new_cr3() callback does nothing when guest paging is disabled or when two-dimentional paging (e.g. EPT on Intel) is active. This means that an emulated write to cr3 can be lost; kvm_set_cr3() will write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its old value and this is what the guest sees. This bug did not have any effect until now because: - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction - without unrestricted guest, and with paging enabled, we also never emulate a mov cr3 instruction - without unrestricted guest, but with paging disabled, the guest's cr3 is ignored until the guest enables paging; at this point the value from arch.cr3 is loaded correctly my the mov cr0 instruction which turns on paging However, the patchset that enables big real mode causes us to emulate mov cr3 instructions in protected mode sometimes (when guest state is not virtualizable by vmx); this mov cr3 is effectively ignored and will crash the guest. The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3 reload. This is awkward because now all the new_cr3 callbacks to the same thing, and because mmu_free_roots() is somewhat of an overkill; but fixing that is more complicated and will be done after this minimal fix. Observed in the Window XP 32-bit installer while bringing up secondary vcpus. Signed-off-by: Avi Kivity <avi@redhat.com>