summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2009-10-14powerpc/mm: Fix hang accessing top of vmalloc spaceBenjamin Herrenschmidt
On pSeries, we always force the IO space to be mapped using 4K pages even with a 64K base page size to cope with some limitations in the HV interface to some devices. However, the SLB miss handler code to discriminate between vmalloc and ioremap space uses a CPU feature section such that the code is nop'ed out when the processor support large pages non-cachable mappings. Thus, we end up always using the ioremap page size for vmalloc segments on such processors, causing a discrepency between the segment and the hash table, and thus a hang continously hashing the page. It works for the first segment of the vmalloc space since that segment is "bolted" in by C code correctly, and thankfully we almost never use the vmalloc space beyond the first segment, but the new percpu code made the bug happen. This fixes it by removing the feature section from the assembly, we now always do the comparison between vmalloc and ioremap. Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/8xx: Fix regression introduced by cache coherency rewriteRex Feany
After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/mm: Remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in arch/powerpc/mm/tlb_low_64e.S Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-23kcore: use registerd physmem informationKAMEZAWA Hiroyuki
For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23walk system ram rangeKAMEZAWA Hiroyuki
Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kcore: register vmalloc area in generic wayKAMEZAWA Hiroyuki
For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23kcore: add kclist typesKAMEZAWA Hiroyuki
Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22arches: drop superfluous casts in nr_free_pages() callersGeert Uytterhoeven
Commit 96177299416dbccb73b54e6b344260154a445375 ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-02powerpc/pseries: Fix to handle slb resize across migrationBrian King
The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-28powerpc/mm: Add MMU features for TLB reservation & Paired MAS registersKumar Gala
Support for TLB reservation (or TLB Write Conditional) and Paired MAS registers are optional for a processor implementation so we handle them via MMU feature sections. We currently only used paired MAS registers to access the full RPN + perm bits that are kept in MAS7||MAS3. We assume that if an implementation has hardware page table at this time it also implements in TLB reservations. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27Merge commit 'kumar/next' into nextBenjamin Herrenschmidt
2009-08-27powerpc/mm: Cleanup handling of execute permissionBenjamin Herrenschmidt
This is an attempt at cleaning up a bit the way we handle execute permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only defined by CPUs that can do something with it, and the myriad of #ifdef's in the I$/D$ coherency code is reduced to 2 cases that hopefully should cover everything. The logic on BookE is a little bit different than what it was though not by much. Since now, _PAGE_EXEC will be set by the generic code for executable pages, we need to filter out if they are unclean and recover it. However, I don't expect the code to be more bloated than it already was in that area due to that change. I could boast that this brings proper enforcing of per-page execute permissions to all BookE and 40x but in fact, we've had that now for some time as a side effect of my previous rework in that area (and I didn't even know it :-) We would only enable execute permission if the page was cache clean and we would only cache clean it if we took and exec fault. Since we now enforce that the later only work if VM_EXEC is part of the VMA flags, we de-fact already enforce per-page execute permissions... Unless I missed something Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-24powerpc/booke: Move MMUCSR definition into mmu-book3e.hKumar Gala
The MMUCSR is now defined as part of the Book-3E architecture so we can move it into mmu-book3e.h and add some of the additional bits defined by the architecture specs. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-08-20Merge commit 'paulus-perf/master' into nextBenjamin Herrenschmidt
2009-08-20powerpc/mm: Fix assert_pte_locked to work properly on uniprocessorKumar Gala
Since the pte_lockptr is a spinlock it gets optimized away on uniprocessor builds so using spin_is_locked is not correct. We can use assert_spin_locked instead and get the proper behavior between UP and SMP builds. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/fsl-booke: read buffer overflowRoel Kluin
cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam) Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpusKumar Gala
Introduced a temporary variable into our iterating over the list cpus that are threads on the same core. For some reason Ben forgot how for loops work. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Remaining 64-bit Book3E supportBenjamin Herrenschmidt
This contains all the bits that didn't fit in previous patches :-) This includes the actual exception handlers assembly, the changes to the kernel entry, other misc bits and wiring it all up in Kconfig. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3EBenjamin Herrenschmidt
The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though we did carve out some virtual space for it, the necessary support code wasn't there. This implements it by using 16M pages for now, though the page size could easily be changed at runtime if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Add TLB management code for 64-bit Book3EBenjamin Herrenschmidt
This adds the TLB miss handler assembly, the low level TLB flush routines along with the necessary hook for dealing with our virtual page tables or indirect TLB entries that need to be flushes when PTE pages are freed. There is currently no support for hugetlbfs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Move around mmu_gathers definition on 64-bitBenjamin Herrenschmidt
The definition for the global structure mmu_gathers, used by generic code, is currently defined in multiple places not including anything used by 64-bit Book3E. This changes it by moving to one place common to all processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Add memory management headers for new 64-bit BookEBenjamin Herrenschmidt
This adds the PTE and pgtable format definitions, along with changes to the kernel memory map and other definitions related to implementing support for 64-bit Book3E. This also shields some asm-offset bits that are currently only relevant on 32-bit We also move the definition of the "linux" page size constants to the common mmu.h file and add a few sizes that are relevant to embedded processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Rework & cleanup page table freeing code pathBenjamin Herrenschmidt
That patch used to just add a hook to page table flushing but pulling that string brought out a whole bunch of issues, so it now does that and more: - We now make the RCU batching of page freeing SMP only, as I believe it was intended initially. We make a few more things compile to nothing on !CONFIG_SMP - Some macros are turned into functions, though that forced me to out of line a few stuffs due to unsolvable include depenencies, however it's probably better that way anyway, it's not -that- critical code path. - 32-bit didn't call pte_free_finish() on tlb_flush() which means that it wouldn't push out the batch to RCU for delayed freeing when a bunch of page tables have been freed, they would just stay in there until the batch gets full. 64-bit BookE will use that hook to maintain the virtually linear page tables or the indirect entries in the TLB when using the HW loader. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Make low level TLB flush ops on BookE take additional argsBenjamin Herrenschmidt
We need to pass down whether the page is direct or indirect and we'll need to pass the page size to _tlbil_va and _tlbivax_bcast We also add a new low level _tlbil_pid_noind() which does a TLB flush by PID but avoids flushing indirect entries if possible This implements those new prototypes but defines them with inlines or macros so that no additional arguments are actually passed on current processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Add support for early ioremap on non-hash 64-bit processorsBenjamin Herrenschmidt
This adds some code to do early ioremap's using page tables instead of bolting entries in the hash table. This will be used by the upcoming 64-bits BookE port. The patch also changes the test for early vs. late ioremap to use slab_is_available() instead of our old hackish mem_init_done. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc/mm: Add HW threads support to no_hash TLB managementBenjamin Herrenschmidt
The current "no hash" MMU context management code is written with the assumption that one CPU == one TLB. This is not the case on implementations that support HW multithreading, where several linux CPUs can share the same TLB. This adds some basic support for this to our context management and our TLB flushing code. It also cleans up the optional debugging output a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Use names rather than numbers for SPRGs (v2)Benjamin Herrenschmidt
The kernel uses SPRG registers for various purposes, typically in low level assembly code as scratch registers or to hold per-cpu global infos such as the PACA or the current thread_info pointer. We want to be able to easily shuffle the usage of those registers as some implementations have specific constraints realted to some of them, for example, some have userspace readable aliases, etc.. and the current choice isn't always the best. This patch should not change any code generation, and replaces the usage of SPRN_SPRGn everywhere in the kernel with a named replacement and adds documentation next to the definition of the names as to what those are used for on each processor family. The only parts that still use the original numbers are bits of KVM or suspend/resume code that just blindly needs to save/restore all the SPRGs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Preload application text segment instead of TASK_UNMAPPED_BASEAnton Blanchard
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can reuse this preload slot by loading in the segment at 0x10000000, where almost all PowerPC binaries are linked at. On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), both the 32bit and 64bit context switch rate improves (tested on a 4GHz POWER6): 32bit: 273k/sec -> 283k/sec 64bit: 277k/sec -> 284k/sec Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Rearrange SLB preload codeAnton Blanchard
With the new top down layout it is likely that the pc and stack will be in the same segment, because the pc is most likely in a library allocated via a top down mmap. Right now we bail out early if these segments match. Rearrange the SLB preload code to sanity check all SLB preload addresses are not in the kernel, then check all addresses for conflicts. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-18powerpc: Allow perf_counters to access user memory at interrupt timePaul Mackerras
This provides a mechanism to allow the perf_counters code to access user memory in a PMU interrupt routine. Such an access can cause various kinds of interrupt: SLB miss, MMU hash table miss, segment table miss, or TLB miss, depending on the processor. This commit only deals with 64-bit classic/server processors, which use an MMU hash table. 32-bit processors are already able to access user memory at interrupt time. Since we don't soft-disable on 32-bit, we avoid the possibility of reentering hash_page or the TLB miss handlers, since they run with interrupts disabled. On 64-bit processors, an SLB miss interrupt on a user address will update the slb_cache and slb_cache_ptr fields in the paca. This is OK except in the case where a PMU interrupt occurs in switch_slb, which also accesses those fields. To prevent this, we hard-disable interrupts in switch_slb. Interrupts are already soft-disabled at this point, and will get hard-enabled when they get soft-enabled later. This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice, and to make sure that it clears the slb_cache_ptr when called from other callers than switch_slb, the existing routine is renamed to __slb_flush_and_rebolt, which is called by switch_slb and the new version of slb_flush_and_rebolt. Similarly, switch_stab (used on POWER3 and RS64 processors) gets a hard_irq_disable() to protect the per-cpu variables used there and in ste_allocate. If a MMU hashtable miss interrupt occurs, normally we would call hash_page to look up the Linux PTE for the address and create a HPTE. However, hash_page is fairly complex and takes some locks, so to avoid the possibility of deadlock, we check the preemption count to see if we are in a (pseudo-)NMI handler, and if so, we don't call hash_page but instead treat it like a bad access that will get reported up through the exception table mechanism. An interrupt whose handler runs even though the interrupt occurred when soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI handler, which should use nmi_enter()/nmi_exit() rather than irq_enter()/irq_exit(). Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-08-14Merge branch 'percpu-for-linus' into percpu-for-nextTejun Heo
Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-07-29powerpc/mm: Fix SMP issue with MMU context handling codeKumar Gala
In switch_mmu_context() if we call steal_context_smp() to get a context to use we shouldn't fall through and than call steal_context_up(). Doing so can be problematic in that the 'mm' that steal_context_up() ends up using will not get marked dirty in the stale_map[] for other CPUs that might have used that mm. Thus we could end up with stale TLB entries in the other CPUs that can cause all kinda of havoc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-07-27mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()Benjamin Herrenschmidt
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08powerpc: Use pr_devel() in do_dcache_icache_coherency()Michael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 2036 368 8 2412 96c arch/powerpc/mm/pgtable.o size after: text data bss dec hex filename 1677 248 8 1933 78d arch/powerpc/mm/pgtable.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08powerpc: Use pr_devel() in arch/powerpc/mm/gup.cMichael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3252 384 0 3636 e34 arch/powerpc/mm/gup.o size after: text data bss dec hex filename 2576 96 0 2672 a70 arch/powerpc/mm/gup.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.cMichael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3261 416 4 3681 e61 arch/powerpc/mm/slb.o size after: text data bss dec hex filename 2861 248 4 3113 c29 arch/powerpc/mm/slb.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08powerpc: Use pr_devel() in arch/powerpc/mm/mmu_context_nohash.cMichael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 1508 48 28 1584 630 powerpc/mm/mmu_context_nohash.o size after: text data bss dec hex filename 1088 0 28 1116 45c powerpc/mm/mmu_context_nohash.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08powerpc: Remove unnecessary semicolonsJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-04Merge branch 'master' into for-nextTejun Heo
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h
2009-06-26powerpc/mm: Make k(un)map_atomic out of lineBenjamin Herrenschmidt
Those functions are way too big to be inline, besides, kmap_atomic() wants to call debug_kmap_atomic() which isn't exported for modules and causes module link failures. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-24percpu: cleanup percpu array definitionsTejun Heo
Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>
2009-06-21Move FAULT_FLAG_xyz into handle_mm_fault() callersLinus Torvalds
This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16powerpc: Add configurable -Werror for arch/powerpcMichael Ellerman
Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-15Merge commit 'origin/master' into nextBenjamin Herrenschmidt
2009-06-12trivial: spelling fix in ppc code commentsSankar P
Fixes a trivial spelling error in powerpc code comments. Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Manual merge of: arch/powerpc/kernel/asm-offsets.c
2009-06-11perf_counter: Standardize event namesPeter Zijlstra
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>