asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2014-01-02	KVM: nVMX: Unconditionally uninit the MMU on nested vmexit	Jan Kiszka
	Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if one guest VCPU manipulates the page of another VCPU in L2, we may be fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in nested state. That can crash the host later on if nested_ept_get_cr3 is invoked while L1 already left vmxon and nested.current_vmcs12 became NULL therefore. Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-12-31	crypto: aesni - fix build on x86 (32bit)	Andy Shevchenko
	It seems commit d764593a "crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode" breaks a build on x86_32 since it's designed only for x86_64. This patch makes a compilation unit conditional to CONFIG_64BIT and functions usage to CONFIG_X86_64. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-30	KVM: x86: Fix APIC map calculation after re-enabling	Jan Kiszka
	Update arch.apic_base before triggering recalculate_apic_map. Otherwise the recalculation will work against the previous state of the APIC and will fail to build the correct map when an APIC is hardware-enabled again. This fixes a regression of 1e08ec4a13. Cc: stable@vger.kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-12-29	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "There is a small EFI fix and a big power regression fix in this batch. My queue also had a fix for downing a CPU when there are insufficient number of IRQ vectors available, but I'm holding that one for now due to recent bug reports" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't select EFI from certain special ACPI drivers x86 idle: Repair large-server 50-watt idle-power regression
2013-12-29	x86/efi: Delete superfluous global variables	Matt Fleming
	There's no need to save the runtime map details in global variables, the values are only required to pass to efi_runtime_map_setup(). And because 'nr_efi_runtime_map' isn't needed, get_nr_runtime_map() can be deleted along with 'efi_data_len'. Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29	x86: Reserve setup_data ranges late after parsing memmap cmdline	Dave Young
	Currently e820_reserve_setup_data() is called before parsing early params, it works in normal case. But for memmap=exactmap, the final memory ranges are created after parsing memmap= cmdline params, so the previous e820_reserve_setup_data() has no effect. For example, setup_data ranges will still be marked as normal system ram, thus when later sysfs driver ioremap them kernel will warn about mapping normal ram. This patch fix it by moving the e820_reserve_setup_data() callback after parsing early params so they can be set as reserved ranges and later ioremap will be fine with it. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29	x86: Export x86 boot_params to sysfs	Dave Young
	kexec-tools use boot_params for getting the 1st kernel hardware_subarch, the kexec kernel EFI runtime support also needs to read the old efi_info from boot_params. Currently it exists in debugfs which is not a good place for such infomation. Per HPA, we should avoid "sploit debugfs". In this patch /sys/kernel/boot_params are exported, also the setup_data is exported as a subdirectory. kexec-tools is using debugfs for hardware_subarch for a long time now so we're not removing it yet. Structure is like below: /sys/kernel/boot_params \|__ data /* boot_params in binary/ \|__ setup_data \| \|__ 0 / the first setup_data node / \| \| \|__ data / setup_data node 0 in binary/ \| \| \|__ type / setup_data type of setup_data node 0, hex string / [snip] \|__ version / boot protocal version (in hex, "0x" prefixed)*/ Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29	x86: Add xloadflags bit for EFI runtime support on kexec	Dave Young
	Old kexec-tools can not load new kernels. The reason is kexec-tools does not fill efi_info in x86 setup header previously, thus EFI failed to initialize. In new kexec-tools it will by default to fill efi_info and pass other EFI required infomation to 2nd kernel so kexec kernel EFI initialization can succeed finally. To prevent from breaking userspace, add a new xloadflags bit so kexec-tools can check the flag and switch to old logic. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-29	x86/efi: Pass necessary EFI data for kexec via setup_data	Dave Young
	Add a new setup_data type SETUP_EFI for kexec use. Passing the saved fw_vendor, runtime, config tables and EFI runtime mappings. When entering virtual mode, directly mapping the EFI runtime regions which we passed in previously. And skip the step to call SetVirtualAddressMap(). Specially for HP z420 workstation we need save the smbios physical address. The kernel boot sequence proceeds in the following order. Step 2 requires efi.smbios to be the physical address. However, I found that on HP z420 EFI system table has a virtual address of SMBIOS in step 1. Hence, we need set it back to the physical address with the smbios in efi_setup_data. (When it is still the physical address, it simply sets the same value.) 1. efi_init() - Set efi.smbios from EFI system table 2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table 3. efi_enter_virtual_mode() - Map EFI ranges Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an HP z420 workstation. Signed-off-by: Dave Young <dyoung@redhat.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-27	x86: Slightly tweak the access_ok() C variant for better code	H. Peter Anvin
	gcc can under very specific circumstances realize that the code sequence: foo += bar; if (foo < bar) ... ... is equivalent to a carry out from the addition. Tweak the implementation of access_ok() (specifically __chk_range_not_ok()) to make it more likely that gcc will make that connection. It isn't fool-proof (sometimes gcc seems to think it can make better code with lea, and ends up with a second comparison), still, but it seems to be able to connect the two more frequently this way. Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-12-27	x86: Replace assembly access_ok() with a C variant	Linus Torvalds
	It turns out that the assembly variant doesn't actually produce that good code, presumably partly because it creates a long dependency chain with no scheduling, and partly because we cannot get a flags result out of gcc (which could be fixed with asm goto, but it turns out not to be worth it.) The C code allows gcc to schedule and generate multiple (easily predictable) branches, and as a side benefit we can really optimize the case where the size is constant. Link: http://lkml.kernel.org/r/CA%2B55aFzPBdbfKovMT8Edr4SmE2_=%2BOKJFac9XW2awegogTkVTA@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-12-24	Merge 3.13-rc5 into staging-next	Greg Kroah-Hartman
	We want these fixes here to handle some merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-21	efi: Export EFI runtime memory mapping to sysfs	Dave Young
	kexec kernel will need exactly same mapping for EFI runtime memory ranges. Thus here export the runtime ranges mapping to sysfs, kexec-tools will assemble them and pass to 2nd kernel via setup_data. Introducing a new directory /sys/firmware/efi/runtime-map just like /sys/firmware/memmap. Containing below attribute in each file of that directory: attribute num_pages phys_addr type virt_addr Signed-off-by: Dave Young <dyoung@redhat.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	efi: Export more EFI table variables to sysfs	Dave Young
	Export fw_vendor, runtime and config table physical addresses to /sys/firmware/efi/{fw_vendor,runtime,config_table} because kexec kernels need them. From EFI spec these 3 variables will be updated to virtual address after entering virtual mode. But kernel startup code will need the physical address. Signed-off-by: Dave Young <dyoung@redhat.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	x86/efi: Cleanup efi_enter_virtual_mode() function	Dave Young
	Add two small functions: efi_merge_regions() and efi_map_regions(), efi_enter_virtual_mode() calls them instead of embedding two long for loop. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	x86/efi: Fix off-by-one bug in EFI Boot Services reservation	Dave Young
	Current code check boot service region with kernel text region by: start+size >= __pa_symbol(_text) The end of the above region should be start + size - 1 instead. I see this problem in ovmf + Fedora 19 grub boot: text start: 1000000 md start: 800000 md size: 800000 Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Toshi Kani <toshi.kani@hp.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Cc: stable@vger.kernel.org Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	x86/efi: Add a wrapper function efi_map_region_fixed()	Dave Young
	Kexec kernel will use saved runtime virtual mapping, so add a new function efi_map_region_fixed() for directly mapping a md to md->virt. The md is passed in from 1st kernel, the virtual addr is saved in md->virt_addr. Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	x86/efi: Remove unused variables in __map_region()	Dave Young
	variables size and end is useless in this function, thus remove them. Reported-by: Toshi Kani <toshi.kani@hp.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dave Young <dyoung@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-12-21	ACPI, APEI, GHES: Do not report only correctable errors with SCI	Chen, Gong
	Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-20	x86, x32: Use __kernel_long_t/__kernel_ulong_t in x86-64 stat.h	H.J. Lu
	Both x32 and x86-64 use the same stat system call interface. But x32 long is 32-bit. This patch changes x86 uapi <asm/stat.h> to use __kernel_long_t/__kernel_ulong_t in x86-64 stat. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/r/CAMe9rOquPtWEro0GQ=Z95pZJ=c7GGkSHynjN4FbiB4p445x-Ng@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-20	KVM: MMU: handle invalid root_hpa at __direct_map	Marcelo Tosatti
	It is possible for __direct_map to be called on invalid root_hpa (-1), two examples: 1) try_async_pf -> can_do_async_pf -> vmx_interrupt_allowed -> nested_vmx_vmexit 2) vmx_handle_exit -> vmx_interrupt_allowed -> nested_vmx_vmexit Then to load_vmcs12_host_state and kvm_mmu_reset_context. Check for this possibility, let fault exception be regenerated. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=924916 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-20	KVM: VMX: Do not skip the instruction if handle_dr injects a fault	Jan Kiszka
	If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not advance the instruction pointer. Otherwise the exception will hit the wrong instruction. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-20	crypto: aesni - AVX and AVX2 version of AESNI-GCM encode and decode	Tim Chen
	We have added AVX and AVX2 routines that optimize AESNI-GCM encode/decode. These routines are optimized for encrypt and decrypt of large buffers. In tests we have seen up to 6% speedup for 1K, 11% speedup for 2K and 18% speedup for 8K buffer over the existing SSE version. These routines should provide even better speedup for future Intel x86_64 cpus. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-20	crypto: arch - use crypto_memneq instead of memcmp	Daniel Borkmann
	Replace remaining occurences (just as we did in crypto/) under arch/*/crypto/ that make use of memcmp() for comparing keys or authentication tags for usage with crypto_memneq(). It can simply be used as a drop-in replacement for the normal memcmp(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: James Yonan <james@openvpn.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-20	stackprotector: Unify the HAVE_CC_STACKPROTECTOR logic between architectures	Kees Cook
	Instead of duplicating the CC_STACKPROTECTOR Kconfig and Makefile logic in each architecture, switch to using HAVE_CC_STACKPROTECTOR and keep everything in one place. This retains the x86-specific bug verification scripts. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: James Hogan <james.hogan@imgtec.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1387481759-14535-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-19	x86, idle: Add memory barriers around clflush in mwait_play_dead()	H. Peter Anvin
	For consistency with mwait_idle_with_hints(). Not sure they help, but they really won't hurt... Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com
2013-12-19	x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers	H. Peter Anvin
	Use static_cpu_has() to conditionalize the CLFLUSH workaround, and add memory barriers around it since the documentation is explicit that CLFLUSH is only ordered with respect to MFENCE. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com
2013-12-19	x86, acpi, idle: Restructure the mwait idle routines	Peter Zijlstra
	People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jacob Jun Pan <jacob.jun.pan@linux.intel.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Len Brown <lenb@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Acked-by: Rafael Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131212141654.616820819@infradead.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-19	x86 idle: Repair large-server 50-watt idle-power regression	Len Brown
	Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-12-19	Merge branch 'master' into for-next	Jiri Kosina
	Sync with Linus' tree to be able to apply fixes on top of newer things in tree (efi-stub). Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-19	x86/mm/numa: Fix 32-bit kernel NUMA boot	Lans Zhang
	When booting a 32-bit x86 kernel on a NUMA machine, node data cannot be allocated from local node if the account of memory for node 0 covers the low memory space entirely: [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff] [ 0.000000] NODE_DATA [mem 0x367ed000-0x367edfff] [ 0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff] [ 0.000000] Cannot find 4096 bytes in node 1 [ 0.000000] 64664MB HIGHMEM available. [ 0.000000] 871MB LOWMEM available. To fix this issue, node data is allowed to be allocated from other nodes if the memory of local node is still not mapped. The expected result looks like this: [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff] [ 0.000000] NODE_DATA [mem 0x367ed000-0x367edfff] [ 0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff] [ 0.000000] NODE_DATA [mem 0x367ec000-0x367ecfff] [ 0.000000] NODE_DATA(1) on node 0 [ 0.000000] 64664MB HIGHMEM available. [ 0.000000] 871MB LOWMEM available. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Cc: <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1386303510-18574-1-git-send-email-jia.zhang@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-19	Merge tag 'v3.13-rc4' into x86/mm	Ingo Molnar
	Pick up the latest fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-18	mm: fix TLB flush race between migration, and change_protection_range	Rik van Riel
	There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [] flush TLB reload TLB from new entry read/write new page lose data [] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18	mm: numa: serialise parallel get_user_page against THP migration	Mel Gorman
	Base pages are unmapped and flushed from cache and TLB during normal page migration and replaced with a migration entry that causes any parallel NUMA hinting fault or gup to block until migration completes. THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages and get_user_pages_fast which commit 3f926ab945b6 ("mm: Close races between THP migration and PMD numa clearing") made worse by introducing a pmd_clear_flush(). This patch forces get_user_page (fast and normal) on a pmd_numa page to go through the slow get_user_page path where it will serialise against THP migration and properly account for the NUMA hinting fault. On the migration side the page table lock is taken for each PTE update. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18	x86, realmode: Pointer walk cleanups, pull out invariant use of __pa()	H. Peter Anvin
	The pointer arithmetic in this function was really bizarre, where in fact all we really wanted was a simple pointer array walk. Use the much more idiomatic construction for that (*ptr++). Factor an invariant use of __pa() out of the relocation loop. At least on 64 bits it seems gcc isn't capable of doing that automatically. Change the scope of a couple of variables to make it extra obvious that they are extremely local temp variables. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-rd908t9c8kvcojdabtmm94mb@git.kernel.org
2013-12-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18	KVM: nVMX: Support direct APIC access from L2	Jan Kiszka
	It's a pathological case, but still a valid one: If L1 disables APIC virtualization and also allows L2 to directly write to the APIC page, we have to forcibly enable APIC virtualization while in L2 if the in-kernel APIC is in use. This allows to run the direct interrupt test case in the vmx unit test without x2APIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-17	Merge branch 'sched-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes for scheduler crashes, each triggers in relatively rare, hardware environment dependent situations" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Rework sched_fair time accounting math64: Add mul_u64_u32_shr() sched: Remove PREEMPT_NEED_RESCHED from generic code sched: Initialize power_orig for overlapping groups
2013-12-17	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "An x86/intel event constraint fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix constraint table end marker bug
2013-12-17	lib: introduce arch optimized hash library	Francesco Fusco
	We introduce a new hashing library that is meant to be used in the contexts where speed is more important than uniformity of the hashed values. The hash library leverages architecture specific implementation to achieve high performance and fall backs to jhash() for the generic case. On Intel-based x86 architectures, the library can exploit the crc32l instruction, part of the Intel SSE4.2 instruction set, if the instruction is supported by the processor. This implementation is twice as fast as the jhash() implementation on an i7 processor. Additional architectures, such as Arm64 provide instructions for accelerating the computation of CRC, so they could be added as well in follow-up work. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17	perf/x86: enable Haswell Celeron RAPL support	Stephane Eranian
	Enable RAPL support for Haswell Celeron (model 69). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1387225224-27799-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16	x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic	Qiaowei Ren
	futex_atomic_cmpxchg_inatomic() is simply the 32-bit implementation of user_atomic_cmpxchg_inatomic(), which in turn is simply a generalization of the original code in futex_atomic_cmpxchg_inatomic(). Use the newly generalized user_atomic_cmpxchg_inatomic() as the futex implementation, too. [ hpa: retain the inline in futex.h rather than changing it to a macro ] Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/1387002303-6620-2-git-send-email-qiaowei.ren@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2013-12-16	x86: add user_atomic_cmpxchg_inatomic at uaccess.h	Qiaowei Ren
	This patch adds user_atomic_cmpxchg_inatomic() to use CMPXCHG instruction against a user space address. This generalizes the already existing futex_atomic_cmpxchg_inatomic() so it can be used in other contexts. This will be used in the upcoming support for Intel MPX (Memory Protection Extensions.) [ hpa: replaced #ifdef inside a macro with IS_ENABLED() ] Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/1387002303-6620-1-git-send-email-qiaowei.ren@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2013-12-16	Merge tag 'v3.13-rc4' into perf/core	Ingo Molnar
	Merge Linux 3.13-rc4, to refresh this branch with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16	Merge tag 'ras_for_3.14' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: * Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-15	Merge branch 'x86/urgent' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a pretty small batch: The biggest single change is to stop using EFI time services on 32-bit platforms. This matches our current behavior on 64-bit platforms as we already had ruled them out there as being too unreliable. Turns out that affects 32-bit platforms, too. One NULL pointer fix for SGI UV. Two minor build fixes, one of which only affects icc and the other which affects icc and future versions or nonstandard default settings of gcc" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Don't use (U)EFI time services on 32 bit x86, build, icc: Remove uninitialized_var() from compiler-intel.h x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used x86, build: Pass in additional -mno-mmx, -mno-sse options
2013-12-13	Merge branch 'pci/yijing-dev_is_pci' into next	Bjorn Helgaas
	* pci/yijing-dev_is_pci: alpha/PCI: Use dev_is_pci() to identify PCI devices arm/PCI: Use dev_is_pci() to identify PCI devices arm/PCI: Use dev_is_pci() to identify PCI devices parisc/PCI: Use dev_is_pci() to identify PCI devices sparc/PCI: Use dev_is_pci() to identify PCI devices ia64/PCI: Use dev_is_pci() to identify PCI devices x86/PCI: Use dev_is_pci() to identify PCI devices PCI: Use dev_is_pci() to identify PCI devices
2013-12-13	PCI: Drop "irq" param from *_restore_msi_irqs()	DuanZhenzhong
	Change x86_msi.restore_msi_irqs(struct pci_dev dev, int irq) to x86_msi.restore_msi_irqs(struct pci_dev dev). restore_msi_irqs() restores multiple MSI-X IRQs, so param 'int irq' is unneeded. This makes code more consistent between vm and bare metal. Dom0 MSI-X restore code can also be optimized as XEN only has a hypercall to restore all MSI-X vectors at one time. Tested-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-12-13	KVM: x86: Add comment on vcpu_enter_guest()'s return value	Takuya Yoshikawa
	Giving proper names to the 0 and 1 was once suggested. But since 0 is returned to the userspace, giving it another name can introduce extra confusion. This patch just explains the meanings instead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-13	KVM: Use cond_resched() directly and remove useless kvm_resched()	Takuya Yoshikawa
	Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>