summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2013-04-22KVM: nVMX: Introduce vmread and vmwrite bitmapsAbel Gordon
Prepare vmread and vmwrite bitmaps according to a pre-specified list of fields. These lists are intended to specifiy most frequent accessed fields so we can minimize the number of fields that are copied from/to the software controlled VMCS12 format to/from to processor-specific shadow vmcs. The lists were built measuring the VMCS fields access rate after L2 Ubuntu 12.04 booted when it was running on top of L1 KVM, also Ubuntu 12.04. Note that during boot there were additional fields which were frequently modified but they were not added to these lists because after boot these fields were not longer accessed by L1. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-22KVM: nVMX: Detect shadow-vmcs capabilityAbel Gordon
Add logic required to detect if shadow-vmcs is supported by the processor. Introduce a new kernel module parameter to specify if L0 should use shadow vmcs (or not) to run L1. Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-22KVM: nVMX: Shadow-vmcs control fields/bitsAbel Gordon
Add definitions for all the vmcs control fields/bits required to enable vmcs-shadowing Signed-off-by: Abel Gordon <abelg@il.ibm.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-22lguest: map Switcher below fixmap.Rusty Russell
Now we've adjusted all the code, we can simply set switcher_addr to wherever it needs to go below the fixmaps, rather than asserting that it should be so. With large NR_CPUS and PAE, people were hitting the "mapping switcher would thwack fixmap" message. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-04-22lguest: assume Switcher text is a single page.Rusty Russell
ie. SHARED_SWITCHER_PAGES == 1. It is well under a page, and it's a minor simplification: it's nice to have *one* simplification in a patch series! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-04-22lguest: prepare to make SWITCHER_ADDR a variable.Rusty Russell
We currently use the whole top PGD entry for the switcher, but that's hitting the fixmap in some configurations (mainly, large NR_CPUS). Introduce a variable, currently set to the constant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-04-21Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix offcore_rsp valid mask for SNB/IVB perf: Treat attr.config as u64 in perf_swevent_init()
2013-04-21perf/x86/amd: Remove old-style NB counter support from perf_event_amd.cJacob Shin
Support for NB counters, MSRs 0xc0010240 ~ 0xc0010247, got moved to perf_event_amd_uncore.c in the following commit: c43ca5091a37 perf/x86/amd: Add support for AMD NB and L2I "uncore" counters AMD Family 10h NB events (events 0xe0 ~ 0xff, on MSRs 0xc001000 ~ 0xc001007) will still continue to be handled by perf_event_amd.c Signed-off-by: Jacob Shin <jacob.shin@amd.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jacob Shin <jacob.shin@amd.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/1366046483-1765-2-git-send-email-jacob.shin@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21perf/x86: Check all MSRs before passing hw checkGeorge Dunlap
check_hw_exists() has a number of checks which go to two exit paths: msr_fail and bios_fail. Checks classified as msr_fail will cause check_hw_exists() to return false, causing the PMU not to be used; bios_fail checks will only cause a warning to be printed, but will return true. The problem is that if there are both msr failures and bios failures, and the routine hits a bios_fail check first, it will exit early and return true, not finishing the rest of the msr checks. If those msrs are in fact broken, it will cause them to be used erroneously. In the case of a Xen PV VM, the guest OS has read access to all the MSRs, but write access is white-listed to supported features. Writes to unsupported MSRs have no effect. The PMU MSRs are not (typically) supported, because they are expensive to save and restore on a VM context switch. One of the "msr_fail" checks is supposed to detect this circumstance (ether for Xen or KVM) and disable the harware PMU. However, on one of my AMD boxen, there is (apparently) a broken BIOS which triggers one of the bios_fail checks. In particular, MSR_K7_EVNTSEL0 has the ARCH_PERFMON_EVENTSEL_ENABLE bit set. The guest kernel detects this because it has read access to all MSRs, and causes it to skip the rest of the checks and try to use the non-existent hardware PMU. This minimally causes a lot of useless instruction emulation and Xen console spam; it may cause other issues with the watchdog as well. This changset causes check_hw_exists() to go through all of the msr checks, failing and returning false if any of them fail. This makes sure that a guest running under Xen without a virtual PMU will detect that there is no functioning PMU and not attempt to use it. This problem affects kernels as far back as 3.2, and should thus be considered for backport. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Link: http://lkml.kernel.org/r/1365000388-32448-1-git-send-email-george.dunlap@eu.citrix.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21perf/x86/amd: Add support for AMD NB and L2I "uncore" countersJacob Shin
Add support for AMD Family 15h [and above] northbridge performance counters. MSRs 0xc0010240 ~ 0xc0010247 are shared across all cores that share a common northbridge. Add support for AMD Family 16h L2 performance counters. MSRs 0xc0010230 ~ 0xc0010237 are shared across all cores that share a common L2 cache. We do not enable counter overflow interrupts. Sampling mode and per-thread events are not supported. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130419213428.GA8229@jshin-Toonie Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21perf/x86/intel: Add Ivy Bridge-EP uncore supportYan, Zheng
The uncore subsystem in Ivy Bridge-EP is similar to Sandy Bridge-EP. There are some differences in config register encoding and pci device IDs. The Ivy Bridge-EP uncore also supports a few new events. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-4-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter managementYan, Zheng
The existing code assumes all Cbox and PCU events are using filter, but actually the filter is event specific. Furthermore the filter is sub-divided into multiple fields which are used by different events. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-3-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reported-by: Stephane Eranian <eranian@google.com>
2013-04-21perf/x86: Avoid kfree() in CPU_{STARTING,DYING}Yan, Zheng
On -rt kfree() can schedule, but CPU_{STARTING,DYING} should be atomic. So use a list to defer kfree until CPU_{ONLINE,DEAD}. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: peterz@infradead.org Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1366113067-3262-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: arch/x86/kernel/cpu/perf_event_intel.c Merge in the latest fixes before applying new patches, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-20Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kdump fixes from Peter Anvin: "The kexec/kdump people have found several problems with the support for loading over 4 GiB that was introduced in this merge cycle. This is partly due to a number of design problems inherent in the way the various pieces of kdump fit together (it is pretty horrifically manual in many places.) After a *lot* of iterations this is the patchset that was agreed upon, but of course it is now very late in the cycle. However, because it changes both the syntax and semantics of the crashkernel option, it would be desirable to avoid a stable release with the broken interfaces." I'm not happy with the timing, since originally the plan was to release the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead... * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kexec: use Crash kernel for Crash kernel low x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low x86, kdump: Retore crashkernel= to allocate under 896M x86, kdump: Set crashkernel_low automatically
2013-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Three groups of fixes: 1. Make sure we don't execute the early microcode patching if family < 6, since it would touch MSRs which don't exist on those families, causing crashes. 2. The Xen partial emulation of HyperV can be dealt with more gracefully than just disabling the driver. 3. More EFI variable space magic. In particular, variables hidden from runtime code need to be taken into account too." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Verify the family before dispatching microcode patching x86, hyperv: Handle Xen emulation of Hyper-V more gracefully x86,efi: Implement efi_no_storage_paranoia parameter efi: Export efi_query_variable_store() for efivars.ko x86/Kconfig: Make EFI select UCS2_STRING efi: Distinguish between "remaining space" and actually used space efi: Pass boot services variable info to runtime code Move utf16 functions to kernel core and rename x86,efi: Check max_size only if it is non-zero. x86, efivars: firmware bug workarounds should be in platform code
2013-04-20Merge remote-tracking branch 'efi/chainsaw' into x86/efiH. Peter Anvin
Resolved Conflicts: drivers/firmware/efivars.c fs/efivarsfs/file.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-19Merge remote-tracking branch 'efi/urgent' into x86/urgentH. Peter Anvin
Matt Fleming (1): x86, efivars: firmware bug workarounds should be in platform code Matthew Garrett (3): Move utf16 functions to kernel core and rename efi: Pass boot services variable info to runtime code efi: Distinguish between "remaining space" and actually used space Richard Weinberger (2): x86,efi: Check max_size only if it is non-zero. x86,efi: Implement efi_no_storage_paranoia parameter Sergey Vlasov (2): x86/Kconfig: Make EFI select UCS2_STRING efi: Export efi_query_variable_store() for efivars.ko Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-19x86, microcode: Verify the family before dispatching microcode patchingH. Peter Anvin
For each CPU vendor that implements CPU microcode patching, there will be a minimum family for which this is implemented. Verify this minimum level of support. This can be done in the dispatch function or early in the application functions. Doing the latter turned out to be somewhat awkward because of the ineviable split between the BSP and the AP paths, and rather than pushing deep into the application functions, do this in the dispatch function. Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie
2013-04-19iommu: Fix compile warnings with forward declarationsJoerg Roedel
The irq_remapping.h file for x86 does not include all necessary forward declarations for the data structures used. This causes compile warnings, so fix it. Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-04-19Merge tag 'edac_amd_f16h' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull AMD F16h support for amd64_edac from Borislav Petkov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19amd64_edac: Add Family 16h supportAravind Gopalakrishnan
Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-04-18x86, hyperv: Handle Xen emulation of Hyper-V more gracefullyK. Y. Srinivasan
Install the Hyper-V specific interrupt handler only when needed. This would permit us to get rid of the Xen check. Note that when the vmbus drivers invokes the call to register its handler, we are sure to be running on Hyper-V. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1366299886-6399-1-git-send-email-kys@microsoft.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-18iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsetsNeil Horman
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. For details of those that reported the problem, please see: https://bugzilla.redhat.com/show_bug.cgi?id=887006 [ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Prarit Bhargava <prarit@redhat.com> CC: Don Zickus <dzickus@redhat.com> CC: Don Dutile <ddutile@redhat.com> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Asit Mallick <asit.k.mallick@intel.com> CC: David Woodhouse <dwmw2@infradead.org> CC: linux-pci@vger.kernel.org CC: Joerg Roedel <joro@8bytes.org> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-04-17KVM: x86: Fix posted interrupt with CONFIG_SMP=nZhang, Yang Z
->send_IPI_mask is not defined on UP. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-17tools/power turbostat: display C8, C9, C10 residencyKristen Carlson Accardi
Display residency in the new C-states, C8, C9, C10. C8, C9, C10 are present on some: "Fourth Generation Intel(R) Core(TM) Processors", which are based on Intel(R) microarchitecture code name Haswell. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2013-04-17x86, kdump: Change crashkernel_high/low= to crashkernel=,high/lowYinghai Lu
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of crashkernel_hign=X crashkernel_low=Y. As that could be extensible. -v2: according to Vivek, change delimiter to ; -v3: let hign and low only handle simple form and it conforms to description in kernel-parameters.txt still keep crashkernel=X override any crashkernel=X,high crashkernel=Y,low -v4: update get_last_crashkernel returning and add more strict checking in parse_crashkernel_simple() found by HATAYAMA. -v5: Change delimiter back to , according to HPA. also separate parse_suffix from parse_simper according to vivek. so we can avoid @pos in that path. -v6: Tight the checking about crashkernel=X,highblahblah,high found by HTYAYAMA. Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Retore crashkernel= to allocate under 896MYinghai Lu
Vivek found old kexec-tools does not work new kernel anymore. So change back crashkernel= back to old behavoir, and add crashkernel_high= to let user decide if buffer could be above 4G, and also new kexec-tools will be needed. -v2: let crashkernel=X override crashkernel_high= update description about _high will be ignored by crashkernel=X -v3: update description about kernel-parameters.txt according to Vivek. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Set crashkernel_low automaticallyYinghai Lu
Chao said that kdump does does work well on his system on 3.8 without extra parameter, even iommu does not work with kdump. And now have to append crashkernel_low=Y in first kernel to make kdump work. We have now modified crashkernel=X to allocate memory beyong 4G (if available) and do not allocate low range for crashkernel if the user does not specify that with crashkernel_low=Y. This causes regression if iommu is not enabled. Without iommu, swiotlb needs to be setup in first 4G and there is no low memory available to second kernel. Set crashkernel_low automatically if the user does not specify that. For system that does support IOMMU with kdump properly, user could specify crashkernel_low=0 to save that 72M low ram. -v3: add swiotlb_size() according to Konrad. -v4: add comments what 8M is for according to hpa. also update more crashkernel_low= in kernel-parameters.txt -v5: update changelog according to Vivek. -v6: Change description about swiotlb referring according to HATAYAMA. Reported-by: WANG Chao <chaowang@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86,efi: Implement efi_no_storage_paranoia parameterRichard Weinberger
Using this parameter one can disable the storage_size/2 check if he is really sure that the UEFI does sane gc and fulfills the spec. This parameter is useful if a devices uses more than 50% of the storage by default. The Intel DQSW67 desktop board is such a sucker for exmaple. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-17idle: Remove GENERIC_IDLE_LOOP config switchThomas Gleixner
All archs are converted over. Remove the config switch and the fallback code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-16x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELFH. Peter Anvin
Refactor the relocs tool so that the same tool can handle 32- and 64-bit ELF. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1365797627-20874-5-git-send-email-keescook@chromium.org
2013-04-16x86, relocs: Build separate 32/64-bit toolsKees Cook
Since the ELF structures and access macros change size based on 32 vs 64 bits, build a separate 32-bit relocs tool (for handling realmode and 32-bit relocations), and a 64-bit relocs tool (for handling 64-bit kernel relocations). Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1365797627-20874-5-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-16x86, relocs: Add 64-bit ELF support to relocs toolKees Cook
This adds the ability to process relocations from the 64-bit kernel ELF, if built with ELF_BITS=64 defined. The special case for the percpu area is handled, along with some other symbols specific to the 64-bit kernel. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1365797627-20874-4-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-16x86, relocs: Consolidate processing logicKees Cook
Instead of counting and then processing relocations, do it in a single pass. This splits the processing logic into separate functions for realmode and 32-bit (and paves the way for 64-bit). Also extracts helper functions when emitting relocations. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1365797627-20874-3-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-16x86, relocs: Generalize ELF structure namesKees Cook
In preparation for making the reloc tool operate on 64-bit relocations, generalize the structure names for easy recompilation via #defines. Based on work by Neill Clift and Michael Davidson. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1365797627-20874-2-git-send-email-keescook@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-16KVM: VMX: Fix check guest state validity if a guest is in VM86 modeGleb Natapov
If guest vcpu is in VM86 mode the vcpu state should be checked as if in real mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: nVMX: check vmcs12 for valid activity statePaolo Bonzini
KVM does not use the activity state VMCS field, and does not support it in nested VMX either (the corresponding bits in the misc VMX feature MSR are zero). Fail entry if the activity state is set to anything but "active". Since the value will always be the same for L1 and L2, we do not need to read and write the corresponding VMCS field on L1/L2 transitions, either. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16xen/smp: Unifiy some of the PVs and PVHVM offline CPU pathKonrad Rzeszutek Wilk
The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar. Lets coalesce them. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one.Konrad Rzeszutek Wilk
There is no need to use the PV version of the IRQ_WORKER mechanism as under PVHVM we are using the native version. The native version is using the SMP API. They just sit around unused: 69: 0 0 xen-percpu-ipi irqwork0 83: 0 0 xen-percpu-ipi irqwork1 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVMKonrad Rzeszutek Wilk
See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 (xen: disable PV spinlocks on HVM) for details. But we did not disable it everywhere - which means that when we boot as PVHVM we end up allocating per-CPU irq line for spinlock. This fixes that. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16xen/spinlock: Check against default value of -1 for IRQ line.Konrad Rzeszutek Wilk
The default (uninitialized) value of the IRQ line is -1. Check if we already have allocated an spinlock interrupt line and if somebody is trying to do it again. Also set it to -1 when we offline the CPU. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16xen/time: Add default value of -1 for IRQ and check for that.Konrad Rzeszutek Wilk
If the timer interrupt has been de-init or is just now being initialized, the default value of -1 should be preset as interrupt line. Check for that and if something is odd WARN us. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16xen/time: Fix kasprintf splat when allocating timer%d IRQ line.Konrad Rzeszutek Wilk
When we online the CPU, we get this splat: smpboot: Booting Node 0 Processor 1 APIC 0x2 installing Xen timer for CPU 1 BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1 Call Trace: [<ffffffff810c1fea>] __might_sleep+0xda/0x100 [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff813036eb>] kvasprintf+0x5b/0x90 [<ffffffff81303758>] kasprintf+0x38/0x40 [<ffffffff81044510>] xen_setup_timer+0x30/0xb0 [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30 [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8 The solution to that is use kasprintf in the CPU hotplug path that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify, and remove the call to in xen_hvm_setup_cpu_clockevents. Unfortunatly the later is not a good idea as the bootup path does not use xen_hvm_cpu_notify so we would end up never allocating timer%d interrupt lines when booting. As such add the check for atomic() to continue. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16KVM: VMX: Use posted interrupt to deliver virtual interruptYang Zhang
If posted interrupt is avaliable, then uses it to inject virtual interrupt to guest. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang
Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: Set TMR when programming ioapic entryYang Zhang
We already know the trigger mode of a given interrupt when programming the ioapice entry. So it's not necessary to set it in each interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: Call common update function when ioapic entry changed.Yang Zhang
Both TMR and EOI exit bitmap need to be updated when ioapic changed or vcpu's id/ldr/dfr changed. So use common function instead eoi exit bitmap specific function. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: VMX: Check the posted interrupt capabilityYang Zhang
Detect the posted interrupt feature. If it exists, then set it in vmcs_config. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16KVM: VMX: Register a new IPI for posted interruptYang Zhang
Posted Interrupt feature requires a special IPI to deliver posted interrupt to guest. And it should has a high priority so the interrupt will not be blocked by others. Normally, the posted interrupt will be consumed by vcpu if target vcpu is running and transparent to OS. But in some cases, the interrupt will arrive when target vcpu is scheduled out. And host will see it. So we need to register a dump handler to handle it. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>