summaryrefslogtreecommitdiffstats
path: root/arch/x86
AgeCommit message (Collapse)Author
2013-06-05KVM: MMU: fast invalidate all pagesXiao Guangrong
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to walk and zap all shadow pages one by one, also it need to zap all guest page's rmap and all shadow page's parent spte list. Particularly, things become worse if guest uses more memory or vcpus. It is not good for scalability In this patch, we introduce a faster way to invalidate all shadow pages. KVM maintains a global mmu invalid generation-number which is stored in kvm->arch.mmu_valid_gen and every shadow page stores the current global generation-number into sp->mmu_valid_gen when it is created When KVM need zap all shadow pages sptes, it just simply increase the global generation-number then reload root shadow pages on all vcpus. Vcpu will create a new shadow page table according to current kvm's generation-number. It ensures the old pages are not used any more. Then the obsolete pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) are zapped by using lock-break technique Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: MMU: drop unnecessary kvm_reload_remote_mmusXiao Guangrong
It is the responsibility of kvm_mmu_zap_all that keeps the consistent of mmu and tlbs. And it is also unnecessary after zap all mmio sptes since no mmio spte exists on root shadow page and it can not be cached into tlb Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercallXiao Guangrong
Quote Gleb's mail: | Back then kvm->lock protected memslot access so code like: | | mutex_lock(&vcpu->kvm->lock); | kvm_mmu_zap_all(vcpu->kvm); | mutex_unlock(&vcpu->kvm->lock); | | which is what 7aa81cc0 does was enough to guaranty that no vcpu will | run while code is patched. This is no longer the case and | mutex_lock(&vcpu->kvm->lock); is gone from that code path long time ago, | so now kvm_mmu_zap_all() there is useless and the code is incorrect. So we drop it and it will be fixed later Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm bugfixes from Gleb Natapov: "The bulk of the fixes is in MIPS KVM kernel<->userspace ABI. MIPS KVM is new for 3.10 and some problems were found with current ABI. It is better to fix them now and do not have a kernel with broken one" * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix race in apic->pending_events processing KVM: fix sil/dil/bpl/spl in the mod/rm fields KVM: Emulate multibyte NOP ARM: KVM: be more thorough when invalidating TLBs ARM: KVM: prevent NULL pointer dereferences with KVM VCPU ioctl mips/kvm: Use ENOIOCTLCMD to indicate unimplemented ioctls. mips/kvm: Fix ABI by moving manipulation of CP0 registers to KVM_{G,S}ET_ONE_REG mips/kvm: Use ARRAY_SIZE() instead of hardcoded constants in kvm_arch_vcpu_ioctl_{s,g}et_regs mips/kvm: Fix name of gpr field in struct kvm_regs. mips/kvm: Fix ABI for use of 64-bit registers. mips/kvm: Fix ABI for use of FPU.
2013-06-04x86, cleanups: Remove extra tab in __flush_tlb_one()Michael Wang
Remove the extra tab in __flush_tlb_one(). CC: Alex Shi <alex.shi@intel.com> CC: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/51AD8902.60603@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-04xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.Konrad Rzeszutek Wilk
The xen_play_dead is an undead function. When the vCPU is told to offline it ends up calling xen_play_dead wherin it calls the VCPUOP_down hypercall which offlines the vCPU. However, when the vCPU is onlined back, it resumes execution right after VCPUOP_down hypercall. That was OK (albeit the API for play_dead assumes that the CPU stays dead and never returns) but with commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe as said commit resets the ts->inidle which at the start of the cpu_idle loop was set. The net effect is that we get this warn: Broke affinity for irq 16 installing Xen timer for CPU 1 cpu 1 spinlock event irq 48 ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1 Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 Call Trace: [<ffffffff816707c8>] dump_stack+0x19/0x1b [<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0 [<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20 [<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0 [<ffffffff810da755>] cpu_startup_entry+0x205/0x250 [<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15 ---[ end trace 915c8c486004dda1 ]--- b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround to call tick_nohz_idle_enter before returning from xen_play_dead() - and that is what this patch does and fixes the issue. We also add the stable part b/c git commit 4b0c0f294 is on the stable tree. CC: stable@vger.kernel.org Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-06-03Finally eradicate CONFIG_HOTPLUGStephen Rothwell
Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03KVM: Fix race in apic->pending_events processingGleb Natapov
apic->pending_events processing has a race that may cause INIT and SIPI processing to be reordered: vpu0: vcpu1: set INIT test_and_clear_bit(KVM_APIC_INIT) process INIT set INIT set SIPI test_and_clear_bit(KVM_APIC_SIPI) process SIPI At the end INIT is left pending in pending_events. The following patch fixes this by latching pending event before processing them. Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03KVM: fix sil/dil/bpl/spl in the mod/rm fieldsPaolo Bonzini
The x86-64 extended low-byte registers were fetched correctly from reg, but not from mod/rm. This fixes another bug in the boot of RHEL5.9 64-bit, but it is still not enough. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-03KVM: Emulate multibyte NOPPaolo Bonzini
This is encountered when booting RHEL5.9 64-bit. There is another bug after this one that is not a simple emulation failure, but this one lets the boot proceed a bit. Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-31x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=mJacob Shin
Fix section mismatch warnings on microcode_amd_early. Compile error occurs when CONFIG_MICROCODE=m, change so that early loading depends on microcode_core. Reported-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31x86: Fix adjust_range_size_mask calling positionYinghai Lu
Commit 8d57470d x86, mm: setup page table in top-down causes a kernel panic while setting mem=2G. [mem 0x00000000-0x000fffff] page 4k [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G [mem 0x00100000-0x001fffff] page 4k [mem 0x00200000-0x7bffffff] page 2M for last entry is not what we want, we should have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 1G Actually we merge the continuous ranges with same page size too early. in this case, before merging we have [mem 0x00200000-0x3fffffff] page 2M [mem 0x40000000-0x7bffffff] page 2M after merging them, will get [mem 0x00200000-0x7bffffff] page 2M even we can use 1G page to map [mem 0x40000000-0x7bffffff] that will cause problem, because we already map [mem 0x7fe00000-0x7fffffff] page 1G [mem 0x7c000000-0x7fdfffff] page 1G with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already. During phys_pud_init() for [0x40000000-0x7bffffff], it will not reuse existing that pud page, and allocate new one then try to use 2M page to map it instead, as page_size_mask does not include PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop in phys_pmd_init stop mapping at 0x7bffffff. That is right behavoir, it maps exact range with exact page size that we ask, and we should explicitly call it to map [7c000000-0x7fffffff] before or after mapping 0x40000000-0x7bffffff. Anyway we need to make sure ranges' page_size_mask correct and consistent after split_mem_range for each range. Fix that by calling adjust_range_size_mask before merging range with same page size. -v2: update change log. -v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and it causes panic. Bisected-by: "Xie, ChanglongX" <changlongx.xie@intel.com> Bisected-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1370015587-20835-1-git-send-email-yinghai@kernel.org Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31x86/mce: Remove check for CONFIG_X86_MCE_P4THERMALPaul Bolle
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32. Remove a useless check for its macro, as it will now always evaluate to false. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31sched/x86: Construct all sibling maps if smtAndrew Jones
Commit 316ad248307fb ("sched/x86: Rewrite set_cpu_sibling_map()") broke the construction of sibling maps, which also broke the booted_cores accounting. Before the rewrite, if smt was present, then each map was updated for each smt sibling. After the rewrite only cpu_sibling_mask gets updated, as the llc and core maps depend on 'has_mc = x86_max_cores > 1' instead. This leads to problems with topologies like the following (qemu -smp sockets=2,cores=1,threads=2) processor : 0 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 1 physical id : 0 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 processor : 2 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 1 processor : 3 physical id : 1 siblings : 1 <= should be 2 core id : 0 cpu cores : 0 <= should be 1 This patch restores the former construction by defining has_mc as (has_smt || x86_max_cores > 1). This should be fine as there were no (has_smt && !has_mc) conditions in the context. Aso rename has_mc to has_mp now that it's not just for cores. Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: a.p.zijlstra@chello.nl Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31x86: __force_order doesn't need to be an actual variableJan Beulich
It being static causes over a dozen instances to be scattered across the kernel image, with non of them ever being referenced in any way. Making the variable extern without ever defining it works as well - all we need is to have the compiler think the variable is being accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A610B802000078000D99A0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31ix86: Don't waste fixmap entriesJan Beulich
The vsyscall related pvclock entries can only ever be used on x86-64, and hence they shouldn't even get allocated for 32-bit kernels (the more that it is there where address space is relatively precious). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/51A60F1F02000078000D997C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30x86, microcode, amd: Early microcode patch loading support for AMDJacob Shin
Add early microcode patch loading support for AMD. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30x86, microcode, amd: Refactor functions to prepare for early loadingJacob Shin
In preparation work for early loading, refactor some common functions that will be shared, and move some struct defines to a common header file. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30x86, microcode: Vendor abstract out save_microcode_in_initrd()Jacob Shin
Currently save_microcode_in_initrd() is declared in vendor neutural microcode.h file, but defined in vendor specific microcode_intel_early.c file. Vendor abstract it out to microcode_core_early.c with a wrapper function. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30x86, microcode, intel: Correct typo in printkBorislav Petkov
User-visible so correct it. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-31Add arch_phys_wc_{add, del} to manipulate WC MTRRs if neededAndy Lutomirski
Several drivers currently use mtrr_add through various #ifdef guards and/or drm wrappers. The vast majority of them want to add WC MTRRs on x86 systems and don't actually need the MTRR if PAT (i.e. ioremap_wc, etc) are working. arch_phys_wc_add and arch_phys_wc_del are new functions, available on all architectures and configurations, that add WC MTRRs on x86 if needed (and handle errors) and do nothing at all otherwise. They're also easier to use than mtrr_add and mtrr_del, so the call sites can be simplified. As an added benefit, this will avoid wasting MTRRs and possibly warning pointlessly on PAT-supporting systems. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: - Three EFI-related fixes - Two early memory initialization fixes - build fix for older binutils - fix for an eager FPU performance regression -- currently we don't allow the use of the FPU at interrupt time *at all* in eager mode, which is clearly wrong. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Allow FPU to be used at interrupt time even with eagerfpu x86, crc32-pclmul: Fix build with older binutils x86-64, init: Fix a possible wraparound bug in switchover in head_64.S x86, range: fix missing merge during add range x86, efi: initial the local variable of DataSize to zero efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse efivarfs: Never return ENOENT from firmware again
2013-05-30x86: Allow FPU to be used at interrupt time even with eagerfpuPekka Riikonen
With the addition of eagerfpu the irq_fpu_usable() now returns false negatives especially in the case of ksoftirqd and interrupted idle task, two common cases for FPU use for example in networking/crypto. With eagerfpu=off FPU use is possible in those contexts. This is because of the eagerfpu check in interrupted_kernel_fpu_idle(): ... * For now, with eagerfpu we will return interrupted kernel FPU * state as not-idle. TBD: Ideally we can change the return value * to something like __thread_has_fpu(current). But we need to * be careful of doing __thread_clear_has_fpu() before saving * the FPU etc for supporting nested uses etc. For now, take * the simple route! ... if (use_eager_fpu()) return 0; As eagerfpu is automatically "on" on those CPUs that also have the features like AES-NI this patch changes the eagerfpu check to return 1 in case the kernel_fpu_begin() has not been said yet. Once it has been the __thread_has_fpu() will start returning 0. Notice that with eagerfpu the __thread_has_fpu is always true initially. FPU use is thus always possible no matter what task is under us, unless the state has already been saved with kernel_fpu_begin(). [ hpa: this is a performance regression, not a correctness regression, but since it can be quite serious on CPUs which need encryption at interrupt time I am marking this for urgent/stable. ] Signed-off-by: Pekka Riikonen <priikone@iki.fi> Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org Cc: <stable@vger.kernel.org> v3.7+ Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30x86, crc32-pclmul: Fix build with older binutilsJan Beulich
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31Merge tag 'stable/for-linus-3.10-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: - Use proper error paths - Clean up APIC IPI usage (incorrect arguments) - Delay XenBus frontend resume is backend (xenstored) is not running - Fix build error with various combinations of CONFIG_ * tag 'stable/for-linus-3.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xenbus_client.c: correct exit path for xenbus_map_ring_valloc_hvm xen-pciback: more uses of cached MSI-X capability offset xen: Clean up apic ipi interface xenbus: save xenstore local status for later use xenbus: delay xenbus frontend resume if xenstored is not running xmem/tmem: fix 'undefined variable' build error.
2013-05-30x86/UV: Add GRU distributed mode mappingsDimitri Sivanich
GRU hardware will support an optional distributed mode that will allow per-node address mapping of local GRU space, as opposed to mapping all GRU hardware to the same contiguous high space. If GRU distributed mode is selected, setup per-node page table mappings. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Russ Anderson <rja@sgi.com> Cc: Mike Travis <travis@sgi.com> Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-29x86: Fix vrtc_get_time/set_mmss to use new timespec interfaceJohn Stultz
The patch "x86: Increase precision of x86_platform.get/set_wallclock" changed the x86 platform set_wallclock/get_wallclock interfaces to use nsec granular timespecs instead of a second granular interface. However, that patch missed converting the vrtc code, so this patch converts those functions to use timespecs. Many thanks to the kbuild test robot for finding this! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-29xen: Clean up apic ipi interfaceStefan Bader
Commit f447d56d36af18c5104ff29dcb1327c0c0ac3634 introduced the implementation of the PV apic ipi interface. But there were some odd things (it seems none of which cause really any issue but maybe they should be cleaned up anyway): - xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself) ignore the passed in vector and only use the CALL_FUNCTION_SINGLE vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector. - physflat_send_IPI_allbutself is declared unnecessarily. It is never used. This patch tries to clean up those things. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-05-28x86-64, init: Fix a possible wraparound bug in switchover in head_64.SZhang Yanfei
In head_64.S, a switchover has been used to handle kernel crossing 1G, 512G boundaries. And commit 8170e6bed465b4b0c7687f93e9948aca4358a33b x86, 64bit: Use a #PF handler to materialize early mappings on demand said: During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. But from the switchover code, when we set up the PUD table: 114 addq $4096, %rdx 115 movq %rdi, %rax 116 shrq $PUD_SHIFT, %rax 117 andl $(PTRS_PER_PUD-1), %eax 118 movq %rdx, (4096+0)(%rbx,%rax,8) 119 movq %rdx, (4096+8)(%rbx,%rax,8) It seems line 119 has a potential bug there. For example, if the kernel is loaded at physical address 511G+1008M, that is 000000000 111111111 111111000 000000000000000000000 and the kernel _end is 512G+2M, that is 000000001 000000000 000000001 000000000000000000000 So in this example, when using the 2nd page to setup PUD (line 114~119), rax is 511. In line 118, we put rdx which is the address of the PMD page (the 3rd page) into entry 511 of the PUD table. But in line 119, the entry we calculate from (4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line 119 should be wraparound into entry 0 of the PUD table. The patch fixes the bug. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28x86: Increase precision of x86_platform.get/set_wallclock()David Vrabel
All the virtualized platforms (KVM, lguest and Xen) have persistent wallclocks that have more than one second of precision. read_persistent_wallclock() and update_persistent_wallclock() allow for nanosecond precision but their implementation on x86 with x86_platform.get/set_wallclock() only allows for one second precision. This means guests may see a wallclock time that is off by up to 1 second. Make set_wallclock() and get_wallclock() take a struct timespec parameter (which allows for nanosecond precision) so KVM and Xen guests may start with a more accurate wallclock time and a Xen dom0 can maintain a more accurate wallclock for guests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto fixes from Herbert Xu: "This push fixes a crash in the new sha256_ssse3 driver as well as a DMA setup/teardown bug in caam" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementations crypto: caam - fix inconsistent assoc dma mapping direction
2013-05-28x86/PCI: Increase info->res_num before checking pci_use_crsYijing Wang
We should increase info->res_num before we checking pci_use_crs return when pci=nocrs set. No functional change, since we don't use res_num and res_offset[] in the "!pci_use_crs" case anyway, but this makes the code read better. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <liuj97@gmail.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Feng Tang <feng.tang@intel.com>
2013-05-28x86, platform, kvm, kconfig: Turn existing .config's into KVM-capable configsBorislav Petkov
Add an config file snippet which enables additional options useful for running the kernel in a kvm guest. When you execute 'make kvmconfig' it merges those options with an already existing user config before you build the kernel. Based on an patch from the external lkvm tree. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: penberg@kernel.org Cc: levinsasha928@gmail.com Cc: mtosatti@redhat.com Cc: fengguang.wu@intel.com Link: http://lkml.kernel.org/r/20130522144638.GB15085@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28update AMD powerflags commentsThorsten Glaser
from http://www.flounder.com/cpuid80000007amd.gif and http://support.amd.com/us/Embedded_TechDocs/25481.pdf Signed-off-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>Zhang Yanfei
arch/x86/boot/compressed/head_64.S includes <asm/pgtable_types.h> and <asm/page_types.h> but it doesn't look like it needs them. So remove them. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5191FAE2.4020403@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28x86_64: Correct phys_addr in cleanup_highmap commentZhang Yanfei
For x86_64, we have phys_base, which means the delta between the the address kernel is actually running at and the address kernel is compiled to run at. Not phys_addr so correct it. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Link: http://lkml.kernel.org/r/5192F9BF.2000802@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28crypto: sha256_ssse3 - add sha224 supportJussi Kivilinna
Add sha224 implementation to sha256_ssse3 module. This also fixes sha256_ssse3 module autoloading issue when 'sha224' is used before 'sha256'. Previously in such case, just sha256_generic was loaded and not sha256_ssse3 (since it did not provide sha224). Now if 'sha256' was used after 'sha224' usage, sha256_ssse3 would remain unloaded. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-28crypto: sha512_ssse3 - add sha384 supportJussi Kivilinna
Add sha384 implementation to sha512_ssse3 module. This also fixes sha512_ssse3 module autoloading issue when 'sha384' is used before 'sha512'. Previously in such case, just sha512_generic was loaded and not sha512_ssse3 (since it did not provide sha384). Now if 'sha512' was used after 'sha384' usage, sha512_ssse3 would remain unloaded. For example, this happens with tcrypt testing module since it tests 'sha384' before 'sha512'. Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-28x86: uaccess s/might_sleep/might_fault/Michael S. Tsirkin
The only reason uaccess routines might sleep is if they fault. Make this explicit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28perf/x86/amd: Rework AMD PMU init codePeter Zijlstra
Josh reported that his QEMU is a bad hardware emulator and trips a WARN in the AMD PMU init code. He requested the WARN be turned into a pr_err() or similar. While there, rework the code a little. Reported-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Robert Richter <rric@kernel.org> Acked-by: Jacob Shin <jacob.shin@amd.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28perf/x86: Check branch sampling priv level in generic codeStephane Eranian
This patch moves commit 7cc23cd to the generic code: perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL The check is now implemented in generic code instead of x86 specific code. That way we do not have to repeat the test in each arch supporting branch sampling. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driverDan Carpenter
We're trying to use 64 bit masks but the shifts wrap so we can't use the high 32 bits. I've fixed this by changing several types to unsigned long long. This is a static checker fix. The one change which is clearly needed is "mask = 0xff << (idx * 8);" where the author obviously intended to use all 64 bits. The other changes are mostly to silence my static checker. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28x86/signals: Merge EFLAGS bit clearing into a single statementJiri Olsa
Merging EFLAGS bit clearing into a single statement, to ensure EFLAGS bits are being cleared in a single instruction. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28x86/signals: Clear RF EFLAGS bit for signal handlerJiri Olsa
Clearing RF EFLAGS bit for signal handler. The reason is that this flag is set by debug exception code to prevent the recursive exception entry. Leaving it set for signal handler might prevent debug exception of the signal handler itself. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-3-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28x86/signals: Propagate RF EFLAGS bit through the signal restore callJiri Olsa
While porting Vince's perf overflow tests I found perf event breakpoint overflow does not work properly. I found the x86 RF EFLAG bit not being set when returning from debug exception after triggering signal handler. Which is exactly what you get when you set perf breakpoint overflow SIGIO handler. This patch and the next two patches fix the underlying bugs. This patch adds the RF EFLAGS bit to be restored on return from signal from the original register context before the signal was entered. This will prevent the RF flag to disappear when returning from exception due to the signal handler being executed. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Stephane Eranian <eranian@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28crypto: sha256_ssse3 - fix stack corruption with SSSE3 and AVX implementationsJussi Kivilinna
The _XFER stack element size was set too small, 8 bytes, when it needs to be 16 bytes. As _XFER is the last stack element used by these implementations, the 16 byte stores with 'movdqa' corrupt the stack where the value of register %r12 is temporarily stored. As these implementations align the stack pointer to 16 bytes, this corruption did not happen every time. Patch corrects this issue. Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Tested-by: Julian Wollrath <jwollrath@web.de> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next because some upcoming net-next changes build on top of bug fixes that went into net. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-24crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a ↵Tim Chen
crypto transform Glue code that plugs the PCLMULQDQ accelerated CRC T10 DIF hash into the crypto framework. The config CRYPTO_CRCT10DIF_PCLMUL should be turned on to enable the feature. The crc_t10dif crypto library function will use this faster algorithm when crct10dif_pclmul module is loaded. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-23Merge tag 'pci-v3.10-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Here are some more fixes for v3.10. The Moorestown update broke Intel Medfield devices, so I reverted it. The acpiphp change fixes a regression: we broke hotplug notifications to host bridges when we split acpiphp into the host-bridge related part and the endpoint-related part. Moorestown Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" Hotplug PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check" * tag 'pci-v3.10-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "x86/pci/mrst: Use configuration mechanism 1 for 00:00.0, 00:02.0, 00:03.0" PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check
2013-05-22Merge tag 'efi-urgent' into x86/urgentH. Peter Anvin
* Avoid confusing the user by returning -EIO instead of -ENOENT in efivarfs if an EFI variable gets deleted from under us and return EOF when reading from a zero-length file - Lingzhu Xiang * Fix an oops in efivar_update_sysfs_entries() caused by reusing (and therefore corrupting) a kzalloc() allocation - Seiji Aguchi * Initialise the DataSize argument to GetVariable() otherwise it will not be updated with the actual size of the variable on return. Discovered on a Acer Aspire V3 BIOS - Lee, Chun-Yi Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>