summaryrefslogtreecommitdiffstats
path: root/arch/x86/mm
AgeCommit message (Collapse)Author
2011-01-07Merge branch 'linus' into x86/apic-cleanupsIngo Molnar
Conflicts: arch/x86/include/asm/io_apic.h Merge reason: Resolve the conflict, update to a more recent -rc base Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-06Merge branch 'x86-security-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: module: Move RO/NX module protection to after ftrace module update x86: Resume trampoline must be executable x86: Add RO/NX protection for loadable kernel modules x86: Add NX protection for kernel data x86: Fix improper large page preservation
2011-01-06Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix included-by file reference comments x86, cpu: Only CPU features determine NX capabilities x86, cpu: Call verify_cpu during 32bit CPU startup x86, cpu: Clear XD_DISABLED flag on Intel to regain NX x86, cpu: Rename verify_cpu_64.S to verify_cpu.S
2011-01-06Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation x86, acpi: Add MAX_LOCAL_APIC for 32bit x86: io_apic: Split setup_ioapic_ids_from_mpc() x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() x86: Allow platforms to force enable apic
2011-01-06Merge branch 'x86-amd-nb-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Cleanup L3 cache index disable support x86, amd-nb: Cleanup AMD northbridge caching code x86, amd-nb: Complete the rename of AMD NB and related code
2011-01-04Merge commit 'v2.6.37-rc8' into x86/apicIngo Molnar
Conflicts: arch/x86/include/asm/io_apic.h Merge reason: move to a fresh -rc, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-23x86, acpi: Parse all SRAT cpu entries even above the cpu number limitationYinghai Lu
Recent Intel new system have different order in MADT, aka will list all thread0 at first, then all thread1. But SRAT table still old order, it will list cpus in one socket all together. If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed to put some cpus apic id to node mapping into apicid_to_node[]. for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). [ 9.235021] divide error: 0000 [#1] SMP [ 9.235315] last sysfs file: [ 9.235481] CPU 1 [ 9.235592] Modules linked in: [ 9.245398] [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 [ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 ... [ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 [ 9.665356] RSP <ffff88103f8d1c40> [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- So let just parse all cpu entries in SRAT. Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of apicid_to_node[]. it fixes following bug too. https://bugzilla.kernel.org/show_bug.cgi?id=22662 -v2: expand to 32bit according to hpa need to add MAX_LOCAL_APIC for 32bit Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Tested-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD486.9020704@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-12-23Merge commit 'v2.6.37-rc7' into x86/securityIngo Molnar
2010-12-09x86, apic: Remove early_init_lapic_mapping()Yinghai Lu
It is almost the same as smp_register_lapic_addr(). We just need to let smp_read_mpc() call smp_register_lapic_addr() when early==1. Add the apic_printk to smp_register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF681.3030509@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-12-07Merge commit 'v2.6.37-rc5' into perf/coreIngo Molnar
Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-22x86: Resume trampoline must be executableLin Ming
commit 5bd5a452(x86: Add NX protection for kernel data) marked the trampoline area NX - which unsurprisingly breaks resume and cpu hotplug. Revert the portion of that commit, which touches the trampoline. Originally-from: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Tested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-18Merge branch 'perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
2010-11-18x86, amd-nb: Complete the rename of AMD NB and related codeHans Rosenfeld
Not only the naming of the files was confusing, it was even more so for the function and variable names. Renamed the K8 NB and NUMA stuff that is also used on other AMD platforms. This also renames the CONFIG_K8_NUMA option to CONFIG_AMD_NUMA and the related file k8topology_64.c to amdtopology_64.c. No functional changes intended. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18x86: Eliminate bp argument from the stack tracing routinesSoeren Sandmann Pedersen
The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann <ssp@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org>, Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Arnaldo Carvalho de Melo <acme@redhat.com>, LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-18x86: Add NX protection for kernel dataMatthieu Castet
This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86: Fix improper large page preservationmatthieu castet
This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18x86: Use online node real index in calulate_tbl_offset()Yinghai Lu
Found a NUMA system that doesn't have RAM installed at the first socket which hangs while executing init scripts. bisected it to: | commit 932967202182743c01a2eee4bdfa2c42697bc586 | Author: Shaohua Li <shaohua.li@intel.com> | Date: Wed Oct 20 11:07:03 2010 +0800 | | x86: Spread tlb flush vector between nodes It turns out when first socket is not online it could have cpus on node1 tlb_offset set to bigger than NUM_INVALIDATE_TLB_VECTORS. That could affect systems like 4 sockets, but socket 2 doesn't have installed, sockets 3 will get too big tlb_offset. Need to use real online node idx. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Shaohua Li <shaohua.li@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CDEDE59.40603@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-10x86, cpu: Only CPU features determine NX capabilitiesKees Cook
Fix the NX feature boot warning when NX is missing to correctly reflect that BIOSes cannot disable NX now. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-11-01x86, mm: Fix section mismatch in tlb.cRakib Mullick
Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback function, which is called from __cpuinit init_smp_flash(). So - it's safe. We were warned by the following warning: WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch in reference from the function tlb_cpuhp_notify() to the function .cpuinit.text:calculate_tlb_offset() The function tlb_cpuhp_notify() references the function __cpuinit calculate_tlb_offset(). This is often because tlb_cpuhp_notify lacks a __cpuinit annotation or the annotation of calculate_tlb_offset is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-29Merge branches 'x86-fixes-for-linus' and 'x86-uv-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternative: Call stop_machine_text_poke() on all cpus x86-32: Restore irq stacks NUMA-aware allocations x86, memblock: Fix early_node_mem with big reserved region. * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, uv: More Westmere support on SGI UV x86, uv: Enable Westmere support on SGI UV
2010-10-28x86, memblock: Fix early_node_mem with big reserved region.Yinghai Lu
Xen can reserve huge amounts of memory for pre-ballooning, but that still shows as RAM in the e820 memory map. early_node_mem could not find range because of start/end adjusting, and will go through the fallback path. However, the fallback patch is still using memblock_x86_find_range_node(), and it is partially top-down because it go through active_range entries from low to high. Let's use memblock_find_in_range instead memblock_x86_find_range_node. So get real top down in fallback path. We may still need to make memblock_x86_find_range_node to do overall top_down work. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CC9A9C9.8020700@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-27Remove duplicate includes from many filesZimny Lech
Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27mm: fix race in kunmap_atomic()Peter Zijlstra
Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26x86: access_error API cleanupMichel Lespinasse
access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm: retry page fault when blocking on disk transferMichel Lespinasse
This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [akpm@linux-foundation.org: fix warning & crash] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm: stack based kmap_atomic()Peter Zijlstra
Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...
2010-10-22Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, mm: Add an initial page table for core bootstrapping
2010-10-22Merge branch 'hwpoison-hugepages' into hwpoisonAndi Kleen
Conflicts: mm/memory-failure.c
2010-10-21Merge branch 'core-memblock-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile
2010-10-21Merge branch 'x86-vmware-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI x86, vmware: Remove deprecated VMI kernel support Fix up trivial #include conflict in arch/x86/kernel/smpboot.c
2010-10-21Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush
2010-10-21Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove stale pmtimer_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI x86, cleanup: Remove obsolete boot_cpu_id variable
2010-10-21Merge branch 'x86-amd-nb-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...
2010-10-20x86: Spread tlb flush vector between nodesShaohua Li
Currently flush tlb vector allocation is based on below equation: sender = smp_processor_id() % 8 This isn't optimal, CPUs from different node can have the same vector, this causes a lot of lock contention. Instead, we can assign the same vectors to CPUs from the same node, while different node has different vectors. This has below advantages: a. if there is lock contention, the lock contention is between CPUs from one node. This should be much cheaper than the contention between nodes. b. completely avoid lock contention between nodes. This especially benefits kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity to specific node. In my test, this could reduce > 20% CPU overhead in extreme case.The test machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd to the first CPU of the node. I run a workload with 4 sequential mmap file read thread. The files are empty sparse file. This workload will trigger a lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the extreme tlb flush lock contention because otherwise kswapd keeps migrating between CPUs of a node and I can't get stable result. Sure in real workload, we can't always see so big tlb flush lock contention, but it's possible. [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20x86-32, mm: Add an initial page table for core bootstrappingBorislav Petkov
This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs - we have initial_page_table for that. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101020070526.GA9588@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20Merge branch 'x86/cleanups' into x86/trampolineH. Peter Anvin
2010-10-20Merge branch 'x86/vmware' into x86/trampolineH. Peter Anvin
2010-10-20x86, mm: Fix incorrect data type in vmalloc_sync_all()Borislav Petkov
arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by 617d34d9e5d8326ec8f188c616aa06ac59d083fe. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19x86, mm: Hold mm->page_table_lock while doing vmalloc_syncJeremy Fitzhardinge
Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <ian.cambell@eu.citrix.com> Originally-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CB88A4C.1080305@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19x86, mm: Fix bogus whitespace in sync_global_pgds()Jeremy Fitzhardinge
Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-14x86: Barf when vmalloc and kmemcheck faults happen in NMIFrederic Weisbecker
In x86, faults exit by executing the iret instruction, which then reenables NMIs if we faulted in NMI context. Then if a fault happens in NMI, another NMI can nest after the fault exits. But we don't yet support nested NMIs because we have only one NMI stack. To prevent from that, check that vmalloc and kmemcheck faults don't happen in this context. Most of the other kernel faults in NMIs can be more easily spotted by finding explicit copy_from,to_user() calls on review. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2010-10-13xen: Cope with unmapped pages when initializing kernel pagetableJeremy Fitzhardinge
Xen requires that all pages containing pagetable entries to be mapped read-only. If pages used for the initial pagetable are already mapped then we can change the mapping to RO. However, if they are initially unmapped, we need to make sure that when they are later mapped, they are also mapped RO. We do this by knowing that the kernel pagetable memory is pre-allocated in the range e820_table_start - e820_table_end, so any pfn within this range should be mapped read-only. However, the pagetable setup code early_ioremaps the pages to write their entries, so we must make sure that mappings created in the early_ioremap fixmap area are mapped RW. (Those mappings are removed before the pages are presented to Xen as pagetable pages.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4CB63A80.8060702@goop.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11Merge branch 'x86/urgent' into core/memblockH. Peter Anvin
Reason for merge: Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree. Resolved Conflicts: arch/x86/mm/srat_64.c Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11x86, numa: For each node, register the memory blocks actually usedYinghai Lu
Russ reported SGI UV is broken recently. He said: | The SRAT table shows that memory range is spread over two nodes. | | SRAT: Node 0 PXM 0 100000000-800000000 | SRAT: Node 1 PXM 1 800000000-1000000000 | SRAT: Node 0 PXM 0 1000000000-1080000000 | |Previously, the kernel early_node_map[] would show three entries |with the proper node. | |[ 0.000000] 0: 0x00100000 -> 0x00800000 |[ 0.000000] 1: 0x00800000 -> 0x01000000 |[ 0.000000] 0: 0x01000000 -> 0x01080000 | |The problem is recent community kernel early_node_map[] shows |only two entries with the node 0 entry overlapping the node 1 |entry. | | 0: 0x00100000 -> 0x01080000 | 1: 0x00800000 -> 0x01000000 After looking at the changelog, Found out that it has been broken for a while by following commit |commit 8716273caef7f55f39fe4fc6c69c5f9f197f41f1 |Author: David Rientjes <rientjes@google.com> |Date: Fri Sep 25 15:20:04 2009 -0700 | | x86: Export srat physical topology Before that commit, register_active_regions() is called for every SRAT memory entry right away. Use nodememblk_range[] instead of nodes[] in order to make sure we capture the actual memory blocks registered with each node. nodes[] contains an extended range which spans all memory regions associated with a node, but that does not mean that all the memory in between are included. Reported-by: Russ Anderson <rja@sgi.com> Tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB27BDF.5000800@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> 2.6.33 .34 .35 .36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-08x86: HWPOISON: Report correct address granuality for huge hwpoison faultsAndi Kleen
An earlier patch fixed the hwpoison fault handling to encode the huge page size in the fault code of the page fault handler. This is needed to report this information in SIGBUS to user space. This is a straight forward patch to pass this information through to the signal handling in the x86 specific fault.c Cc: x86@kernel.org Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: fengguang.wu@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08Merge commit 'v2.6.36-rc7' into core/memblockIngo Molnar
Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05x86, memblock: Remove __memblock_x86_find_in_range_size()Yinghai Lu
Fold it into memblock_x86_find_in_range(), and change bad_addr_size() to check_reserve_memblock(). So whole memblock_x86_find_in_range_size() code is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CAA4DEC.4000401@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05x86-32, memblock: Make add_highpages honor early reserved rangesYinghai Lu
Originally the only early reserved range that is overlapped with high pages is "KVA RAM", but we already do remove that from the active ranges. However, It turns out Xen could have that kind of overlapping to support memory ballooning.x So we need to make add_highpage_with_active_regions() to subtract memblock reserved just like low ram; this is the proper design anyway. In this patch, refactering get_freel_all_memory_range() to make it can be used by add_highpage_with_active_regions(). Also we don't need to remove "KVA RAM" from active ranges. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABB183.1040607@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>