summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2013-06-12x86: Fix typo in kexec register clearingKees Cook
Fixes a typo in register clearing code. Thanks to PaX Team for fixing this originally, and James Troup for pointing it out. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net Cc: <stable@vger.kernel.org> v2.6.30+ Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12x86, relocs: Move __vvar_page from S_ABS to S_RELKees Cook
The __vvar_page relocation should actually be listed in S_REL instead of S_ABS. Oddly, this didn't always cause things to break, presumably because there are no users for relocation information on 64 bits yet. [ hpa: Not for stable - new code in 3.10 ] Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.net Reported-by: Michael Davidson <md@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12KVM: x86: handle idiv overflow at kvm_write_tscMarcelo Tosatti
Its possible that idivl overflows (due to large delta stored in usdiff, valid scenario). Create an exception handler to catch the overflow exception (division by zero is protected by vcpu->arch.virtual_tsc_khz check), and interpret it accordingly (delta is larger than USEC_PER_SEC). Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11idle: Add the stack canary init to cpu_startup_entry()Thomas Gleixner
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86: Use generic idle loop") wreckaged the stack protector. I stupidly missed that boot_init_stack_canary() must be inlined from a function which never returns, but I put that call into arch_cpu_idle_prepare() which of course returns. I pondered to play tricks with arch_cpu_idle_prepare() first, but then I noticed, that the other archs which have implemented the stackprotector (ARM and SH) do not initialize the canary for the non-boot cpus. So I decided to move the boot_init_stack_canary() call into cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef is just a temporary measure as I don't want to inflict the boot_init_stack_canary() call on ARM and SH that late in the cycle. I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH maintainers have no objection. Reported-by: Wouter van Kesteren <woutershep@gmail.com> Cc: x86@kernel.org Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11x86, efi: retry ExitBootServices() on failureZach Bobroff
ExitBootServices is absolutely supposed to return a failure if any ExitBootServices event handler changes the memory map. Basically the get_map loop should run again if ExitBootServices returns an error the first time. I would say it would be fair that if ExitBootServices gives an error the second time then Linux would be fine in returning control back to BIOS. The second change is the following line: again: size += sizeof(*mem_map) * 2; Originally you were incrementing it by the size of one memory map entry. The issue here is all related to the low_alloc routine you are using. In this routine you are making allocations to get the memory map itself. Doing this allocation or allocations can affect the memory map by more than one record. [ mfleming - changelog, code style ] Signed-off-by: Zach Bobroff <zacharyb@ami.com> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-11efi: Convert runtime services function ptrsBorislav Petkov
... to void * like the boot services and lose all the void * casts. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-10Modify UEFI anti-bricking codeMatthew Garrett
This patch reworks the UEFI anti-bricking code, including an effective reversion of cc5a080c and 31ff2f20. It turns out that calling QueryVariableInfo() from boot services results in some firmware implementations jumping to physical addresses even after entering virtual mode, so until we have 1:1 mappings for UEFI runtime space this isn't going to work so well. Reverting these gets us back to the situation where we'd refuse to create variables on some systems because they classify deleted variables as "used" until the firmware triggers a garbage collection run, which they won't do until they reach a lower threshold. This results in it being impossible to install a bootloader, which is unhelpful. Feedback from Samsung indicates that the firmware doesn't need more than 5KB of storage space for its own purposes, so that seems like a reasonable threshold. However, there's still no guarantee that a platform will attempt garbage collection merely because it drops below this threshold. It seems that this is often only triggered if an attempt to write generates a genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to create a variable larger than the remaining space. This should fail, but if it somehow succeeds we can then immediately delete it. I've tested this on the UEFI machines I have available, but I don't have a Samsung and so can't verify that it avoids the bricking problem. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ] Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-10Merge tag 'stable/for-linus-3.10-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two bug-fixes for regressions: - xen/tmem stopped working after a certain combination of modprobe/swapon was used - cpu online/offlining would trigger WARN_ON." * tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it. xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.
2013-06-10xen/time: Free onlined per-cpu data structure if we want to online it again.Konrad Rzeszutek Wilk
If the per-cpu time data structure has been onlined already and we are trying to online it again, then free the previous copy before blindly over-writting it. A developer naturally should not call this function multiple times but just in case. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/time: Check that the per_cpu data structure has data before freeing.Konrad Rzeszutek Wilk
We don't check whether the per_cpu data structure has actually been freed in the past. This checks it and if it has been freed in the past then just continues on without double-freeing. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/time: Don't leak interrupt name when offlining.Konrad Rzeszutek Wilk
When the user does: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online kmemleak reports: kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak) One of the leaks is from xen/time: unreferenced object 0xffff88003fa51280 (size 32): comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s) hex dump (first 32 bytes): 74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00 timer1.......... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81660721>] kmemleak_alloc+0x21/0x50 [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0 [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90 [<ffffffff812fe228>] kasprintf+0x38/0x40 [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0 [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8 [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b [<ffffffff8166bd48>] cpu_up+0xd9/0xec [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3 [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6 [<ffffffff8165ce39>] kernel_init+0x9/0xf0 [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff This patch fixes it by stashing away the 'name' in the per-cpu data structure and freeing it when offlining the CPU. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/time: Encapsulate the struct clock_event_device in another structure.Konrad Rzeszutek Wilk
We don't do any code movement. We just encapsulate the struct clock_event_device in a new structure which contains said structure and a pointer to a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt name when offlining'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/spinlock: Don't leak interrupt name when offlining.Konrad Rzeszutek Wilk
When the user does: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online kmemleak reports: kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak) unreferenced object 0xffff88003fa51260 (size 32): comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s) hex dump (first 32 bytes): 73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00 spinlock1....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81660721>] kmemleak_alloc+0x21/0x50 [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0 [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90 [<ffffffff812fe228>] kasprintf+0x38/0x40 [<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe [<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8 [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b [<ffffffff8166bd48>] cpu_up+0xd9/0xec [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3 [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6 [<ffffffff8165ce39>] kernel_init+0x9/0xf0 [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining" patch did (which has a per-cpu structure which contains both the IRQ number and char*) we use a per-cpu pointers to a *char. The reason is that the "__this_cpu_read(lock_kicker_irq);" macro blows up with "__bad_size_call_parameter()" as the size of the returned structure is not within the parameters of what it expects and optimizes for. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/smp: Don't leak interrupt name when offlining.Konrad Rzeszutek Wilk
When the user does: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online kmemleak reports: kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak) unreferenced object 0xffff88003fa51240 (size 32): comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s) hex dump (first 32 bytes): 72 65 73 63 68 65 64 31 00 00 00 00 00 00 00 00 resched1........ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81660721>] kmemleak_alloc+0x21/0x50 [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0 [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90 [<ffffffff812fe228>] kasprintf+0x38/0x40 [<ffffffff81047ed1>] xen_smp_intr_init+0x41/0x2c0 [<ffffffff816636d3>] xen_cpu_up+0x393/0x3e8 [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b [<ffffffff8166bd48>] cpu_up+0xd9/0xec [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3 [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6 [<ffffffff8165ce39>] kernel_init+0x9/0xf0 [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff This patch fixes some of it by using the 'struct xen_common_irq->name' field to stash away the char so that it can be freed when the interrupt line is destroyed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/smp: Set the per-cpu IRQ number to a valid default.Konrad Rzeszutek Wilk
When we free it we want to make sure to set it to a default value of -1 so that we don't double-free it (in case somebody calls us twice). Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/smp: Introduce a common structure to contain the IRQ name and interrupt ↵Konrad Rzeszutek Wilk
line. This patch adds a new structure to contain the common two things that each of the per-cpu interrupts need: - an interrupt number, - and the name of the interrupt (to be added in 'xen/smp: Don't leak interrupt name when offlining'). This allows us to carry the tuple of the per-cpu interrupt data structure and expand it as we need in the future. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-10xen/smp: Coalesce the free_irq calls in one function.Konrad Rzeszutek Wilk
There are two functions that do a bunch of 'free_irq' on the per_cpu IRQ. Instead of having duplicate code just move it to one function. This is just code movement. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge 'net' into 'net-next' to get the MSG_CMSG_COMPAT regression fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06Merge tag 'pci-v3.10-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "This fixes a crash when booting a 32-bit kernel via the EFI boot stub. PCI ROM from EFI x86/PCI: Map PCI setup data with ioremap() so it can be in highmem" * tag 'pci-v3.10-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
2013-06-06x86: Get rid of ->hard_math and all the FPU asm fuH. Peter Anvin
Reimplement FPU detection code in C and drop old, not-so-recommended detection method in asm. Move all the relevant stuff into i387.c where it conceptually belongs. Finally drop cpuinfo_x86.hard_math. [ hpa: huge thanks to Borislav for taking my original concept patch and productizing it ] [ Boris, note to self: do not use static_cpu_has before alternatives! ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-06UEFI: Don't pass boot services regions to SetVirtualAddressMap()Matthew Garrett
We need to map boot services regions during startup in order to avoid firmware bugs, but we shouldn't be passing those regions to SetVirtualAddressMap(). Ensure that we're only passing regions that are marked as being mapped at runtime. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-06-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge 'net' bug fixes into 'net-next' as we have patches that will build on top of them. This merge commit includes a change from Emil Goode (emilgoode@gmail.com) that fixes a warning that would have been introduced by this merge. Specifically it fixes the pingv6_ops method ipv6_chk_addr() to add a "const" to the "struct net_device *dev" argument and likewise update the dummy_ipv6_chk_addr() declaration. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05x86, microcode, amd: Allow multiple families' bin files appended togetherJacob Shin
Add support for parsing through multiple families' microcode patch container binary files appended together when early loading. This is already supported on Intel. Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1370463236-2115-3-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05x86, microcode, amd: Make find_ucode_in_initrd() __initJacob Shin
Change find_ucode_in_initrd() to __init and only let BSP call it during cold boot. This is the right thing to do because only BSP will see initrd loaded by the boot loader. APs will offset into initrd_start to find the microcode patch binary. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1370463236-2115-2-git-send-email-jacob.shin@amd.com Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05x86/PCI: Map PCI setup data with ioremap() so it can be in highmemMatt Fleming
f9a37be0f0 ("x86: Use PCI setup data") added support for using PCI ROM images from setup_data. This used phys_to_virt(), which is not valid for highmem addresses, and can cause a crash when booting a 32-bit kernel via the EFI boot stub. pcibios_add_device() assumes that the physical addresses stored in setup_data are accessible via the direct kernel mapping, and that calling phys_to_virt() is valid. This isn't guaranteed to be true on x86 where the direct mapping range is much smaller than on x86-64. Calling phys_to_virt() on a highmem address results in the following: BUG: unable to handle kernel paging request at 39a3c198 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90 ... Call Trace: [<c2370c73>] pci_device_add+0xe3/0x130 [<c274640b>] pci_scan_single_device+0x8b/0xb0 [<c2370d08>] pci_scan_slot+0x48/0x100 [<c2371904>] pci_scan_child_bus+0x24/0xc0 [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490 [<c23b7203>] acpi_pci_root_add+0x312/0x42f ... The solution is to use ioremap() instead of phys_to_virt() to map the setup data into the kernel address space. [bhelgaas: changelog] Tested-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: stable@vger.kernel.org # v3.8+
2013-06-05x86, mce: Fix "braodcast" typoMathias Krause
Fix the typo in MCJ_IRQ_BRAODCAST. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-05KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zappedGleb Natapov
Quote Gleb's mail: | why don't we check for sp->role.invalid in | kvm_mmu_prepare_zap_page before calling kvm_reload_remote_mmus()? and | Actually we can add check for is_obsolete_sp() there too since | kvm_mmu_invalidate_all_pages() already calls kvm_reload_remote_mmus() | after incrementing mmu_valid_gen. [ Xiao: add some comments and the check of is_obsolete_sp() ] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: reclaim the zapped-obsolete page firstXiao Guangrong
As Marcelo pointed out that | "(retention of large number of pages while zapping) | can be fatal, it can lead to OOM and host crash" We introduce a list, kvm->arch.zapped_obsolete_pages, to link all the pages which are deleted from the mmu cache but not actually freed. When page reclaiming is needed, we always zap this kind of pages first. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: collapse TLB flushes when zap all pagesXiao Guangrong
kvm_zap_obsolete_pages uses lock-break technique to zap pages, it will flush tlb every time when it does lock-break We can reload mmu on all vcpus after updating the generation number so that the obsolete pages are not used on any vcpus, after that we do not need to flush tlb when obsolete pages are zapped It will do kvm_mmu_prepare_zap_page many times and use one kvm_mmu_commit_zap_page to collapse tlb flush, the side-effects is that causes obsolete pages unlinked from active_list but leave on hash-list, so we add the comment around the hash list walker Note: kvm_mmu_commit_zap_page is still needed before free the pages since other vcpus may be doing locklessly shadow page walking Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: zap pages in batchXiao Guangrong
Zap at lease 10 pages before releasing mmu-lock to reduce the overload caused by requiring lock After the patch, kvm_zap_obsolete_pages can forward progress anyway, so update the comments [ It improves the case 0.6% ~ 1% that do kernel building meanwhile read PCI ROM. ] Note: i am not sure that "10" is the best speculative value, i just guessed that '10' can make vcpu do not spend long time on kvm_zap_obsolete_pages and do not cause mmu-lock too hungry. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: do not reuse the obsolete pageXiao Guangrong
The obsolete page will be zapped soon, do not reuse it to reduce future page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pagesXiao Guangrong
It is good for debug and development Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: show mmu_valid_gen in shadow page related tracepointsXiao Guangrong
Show sp->mmu_valid_gen Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: x86: use the fast way to invalidate all pagesXiao Guangrong
Replace kvm_mmu_zap_all by kvm_mmu_invalidate_zap_all_pages Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: fast invalidate all pagesXiao Guangrong
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to walk and zap all shadow pages one by one, also it need to zap all guest page's rmap and all shadow page's parent spte list. Particularly, things become worse if guest uses more memory or vcpus. It is not good for scalability In this patch, we introduce a faster way to invalidate all shadow pages. KVM maintains a global mmu invalid generation-number which is stored in kvm->arch.mmu_valid_gen and every shadow page stores the current global generation-number into sp->mmu_valid_gen when it is created When KVM need zap all shadow pages sptes, it just simply increase the global generation-number then reload root shadow pages on all vcpus. Vcpu will create a new shadow page table according to current kvm's generation-number. It ensures the old pages are not used any more. Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) are zapped by using lock-break technique Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: drop unnecessary kvm_reload_remote_mmusXiao Guangrong
It is the responsibility of kvm_mmu_zap_all that keeps the consistent of mmu and tlbs. And it is also unnecessary after zap all mmio sptes since no mmio spte exists on root shadow page and it can not be cached into tlb Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercallXiao Guangrong
Quote Gleb's mail: | Back then kvm->lock protected memslot access so code like: | | mutex_lock(&vcpu->kvm->lock); | kvm_mmu_zap_all(vcpu->kvm); | mutex_unlock(&vcpu->kvm->lock); | | which is what 7aa81cc0 does was enough to guaranty that no vcpu will | run while code is patched. This is no longer the case and | mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago, | so now kvm_mmu_zap_all() there is useless and the code is incorrect. So we drop it and it will be fixed later Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm bugfixes from Gleb Natapov: "The bulk of the fixes is in MIPS KVM kernel<->userspace ABI. MIPS KVM is new for 3.10 and some problems were found with current ABI. It is better to fix them now and do not have a kernel with broken one" * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix race in apic->pending_events processing KVM: fix sil/dil/bpl/spl in the mod/rm fields KVM: Emulate multibyte NOP ARM: KVM: be more thorough when invalidating TLBs ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls. mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs mips/kvm: Fix name of gpr field in struct kvm_regs. mips/kvm: Fix ABI for use of 64-bit registers. mips/kvm: Fix ABI for use of FPU.
2013-06-04x86, cleanups: Remove extra tab in __flush_tlb_one()Michael Wang
Remove the extra tab in __flush_tlb_one(). CC: Alex Shi <alex.shi@intel.com> CC: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-04xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.Konrad Rzeszutek Wilk
The xen_play_dead is an undead function. When the vCPU is told to offline it ends up calling xen_play_dead wherin it calls the VCPUOP_down hypercall which offlines the vCPU. However, when the vCPU is onlined back, it resumes execution right after VCPUOP_down hypercall. That was OK (albeit the API for play_dead assumes that the CPU stays dead and never returns) but with commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe as said commit resets the ts->inidle which at the start of the cpu_idle loop was set. The net effect is that we get this warn: Broke affinity for irq 16 installing Xen timer for CPU 1 cpu 1 spinlock event irq 48 ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1 Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 Call Trace: [<ffffffff816707c8>] dump_stack+0x19/0x1b [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15 ---[ end trace 915c8c486004dda1 ]--- b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround to call tick_nohz_idle_enter before returning from xen_play_dead() - and that is what this patch does and fixes the issue. We also add the stable part b/c git commit 4b0c0f294 is on the stable tree. CC: stable@vger.kernel.org Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-03Finally eradicate CONFIG_HOTPLUGStephen Rothwell
Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03KVM: Fix race in apic->pending_events processingGleb Natapov
apic->pending_events processing has a race that may cause INIT and SIPI processing to be reordered: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI At the end INIT is left pending in pending_events. The following patch fixes this by latching pending event before processing them. Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03KVM: fix sil/dil/bpl/spl in the mod/rm fieldsPaolo Bonzini
The x86-64 extended low-byte registers were fetched correctly from reg, but not from mod/rm. This fixes another bug in the boot of RHEL5.9 64-bit, but it is still not enough. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03KVM: Emulate multibyte NOPPaolo Bonzini
This is encountered when booting RHEL5.9 64-bit. There is another bug after this one that is not a simple emulation failure, but this one lets the boot proceed a bit. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-31x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=mJacob Shin
Fix section mismatch warnings on microcode_amd_early. Compile error occurs when CONFIG_MICROCODE=m, change so that early loading depends on microcode_core. Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31x86: Fix adjust_range_size_mask calling positionYinghai Lu
Commit 8d57470d x86, mm: setup page table in top-down causes a kernel panic while setting mem=2G. [mem 0x00000000-0x000fffff] page 4k [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G [mem 0x00100000-0x001fffff] page 4k [mem 0x00200000-0x7bffffff] page 2M for last entry is not what we want, we should have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 1G Actually we merge the continuous ranges with same page size too early. in this case, before merging we have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 2M after merging them, will get [mem 0x00200000-0x7bffffff] page 2M even we can use 1G page to map [mem 0x40000000-0x7bffffff] that will cause problem, because we already map [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already. During phys_pud_init() for [0x40000000-0x7bffffff], it will not reuse existing that pud page, and allocate new one then try to use 2M page to map it instead, as page_size_mask does not include PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop in phys_pmd_init stop mapping at 0x7bffffff. That is right behavoir, it maps exact range with exact page size that we ask, and we should explicitly call it to map [7c000000-0x7fffffff] before or after mapping 0x40000000-0x7bffffff. Anyway we need to make sure ranges' page_size_mask correct and consistent after split_mem_range for each range. Fix that by calling adjust_range_size_mask before merging range with same page size. -v2: update change log. -v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and it causes panic. Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com> Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31x86/mce: Remove check for CONFIG_X86_MCE_P4THERMALPaul Bolle
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32. Remove a useless check for its macro, as it will now always evaluate to false. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31sched/x86: Construct all sibling maps if smtAndrew Jones
Commit 316ad248307fb ("sched/x86: Rewrite set_cpu_sibling_map()") broke the construction of sibling maps, which also broke the booted_cores accounting. Before the rewrite, if smt was present, then each map was updated for each smt sibling. After the rewrite only cpu_sibling_mask gets updated, as the llc and core maps depend on 'has_mc = x86_max_cores > 1' instead. This leads to problems with topologies like the following (qemu -smp sockets=2,cores=1,threads=2) processor : 0 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 1 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 processor : 2 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 3 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 This patch restores the former construction by defining has_mc as (has_smt || x86_max_cores > 1). This should be fine as there were no (has_smt && !has_mc) conditions in the context. Aso rename has_mc to has_mp now that it's not just for cores. Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: a.p.zijlstra@chello.nl Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31x86: __force_order doesn't need to be an actual variableJan Beulich
It being static causes over a dozen instances to be scattered across the kernel image, with non of them ever being referenced in any way. Making the variable extern without ever defining it works as well - all we need is to have the compiler think the variable is being accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31ix86: Don't waste fixmap entriesJan Beulich
The vsyscall related pvclock entries can only ever be used on x86-64, and hence they shouldn't even get allocated for 32-bit kernels (the more that it is there where address space is relatively precious). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>