summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm/mem.c
AgeCommit message (Collapse)Author
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-10-31powerpc: various straight conversions from module.h --> export.hPaul Gortmaker
All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-20powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probeAnton Blanchard
If we echo an address the hypervisor doesn't like to /sys/devices/system/memory/probe we oops the box: # echo 0x10000000000 > /sys/devices/system/memory/probe kernel BUG at arch/powerpc/mm/hash_utils_64.c:541! The backtrace is: create_section_mapping arch_add_memory add_memory memory_probe_store sysdev_class_store sysfs_write_file vfs_write SyS_write In create_section_mapping we BUG if htab_bolt_mapping returned an error. A better approach is to return an error which will propagate back to userspace. Rerunning the test with this patch applied: # echo 0x10000000000 > /sys/devices/system/memory/probe -bash: echo: write error: Invalid argument Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20powerpc: Hugetlb for BookEBecky Bruce
Enable hugepages on Freescale BookE processors. This allows the kernel to use huge TLB entries to map pages, which can greatly reduce the number of TLB misses and the amount of TLB thrashing experienced by applications with large memory footprints. Care should be taken when using this on FSL processors, as the number of large TLB entries supported by the core is low (16-64) on current processors. The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g. Page sizes larger than the max zone size are called "gigantic" pages and must be allocated on the command line (and cannot be deallocated). This is currently only fully implemented for Freescale 32-bit BookE processors, but there is some infrastructure in the code for 64-bit BooKE. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-19powerpc/mm: Fix output of total_ram.Tony Breeds
On 32bit platforms that support >= 4GB memory total_ram was truncated. This creates a confusing printk: Top of RAM: 0x100000000, Total RAM: 0x0 Fix that: Top of RAM: 0x100000000, Total RAM: 0x100000000 Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-08powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKEBecky Bruce
This is used to round-robin TLBCAM entries. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-06-30powerpc: Add printk companion for ppc_md.progressDave Carroll
This patch adds a printk companion to replace the udbg progress function when initmem is freed. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30powerpc: Move free_initmem to common codeDave Carroll
The free_initmem function is basically duplicated in mm/init_32, and init_64, and is moved to the common 32/64-bit mm/mem.c. All other sections except init were removed in v2.6.15 by 6c45ab992e4299c869fb26427944a8f8ea177024 (powerpc: Remove section free() and linker script bits), and therefore the bulk of the executed code is identical. This patch also removes updating ppc_md.progress to NULL in the powermac late_initcall. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-30Merge remote branch 'origin/master' into nextBenjamin Herrenschmidt
2011-06-29powerpc: mem_init should call memblock_is_reserved with phys_addr_tBecky Bruce
This has been broken for a while but hasn't been an issue until now because nobody was reserving regions at high addresses. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-09powerpc: Force page alignment for initrd reserved memoryBenjamin Herrenschmidt
When using 64K pages with a separate cpio rootfs, U-Boot will align the rootfs on a 4K page boundary. When the memory is reserved, and subsequent early memblock_alloc is called, it will allocate memory between the 64K page alignment and reserved memory. When the reserved memory is subsequently freed, it is done so by pages, causing the early memblock_alloc requests to be re-used, which in my case, caused the device-tree to be clobbered. This patch forces the reserved memory for initrd to be kernel page aligned, and will move the device tree if it overlaps with the range extension of initrd. This patch will also consolidate the identical function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c, and adds the same range extension when freeing initrd. free_initrd_mem() is also moved to the __init section. Many thanks to Milton Miller for his input on this patch. [BenH: Fixed build without CONFIG_BLK_DEV_INITRD] Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2010-10-12memblock, bootmem: Round pfn properly for memory and reserved regionsYinghai Lu
We need to round memory regions correctly -- specifically, we need to round reserved region in the more expansive direction (lower limit down, upper limit up) whereas usable memory regions need to be rounded in the more restrictive direction (lower limit up, upper limit down). This introduces two set of inlines: memblock_region_memory_base_pfn() memblock_region_memory_end_pfn() memblock_region_reserved_base_pfn() memblock_region_reserved_end_pfn() Although they are antisymmetric (and therefore are technically duplicates) the use of the different inlines explicitly documents the programmer's intention. The lack of proper rounding caused a bug on ARM, which was then found to also affect other architectures. Reported-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB4CDFD.4020105@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-05memblock: Remove memblock_type.size and add memblock.memory_size insteadBenjamin Herrenschmidt
Right now, both the "memory" and "reserved" memblock_type structures have a "size" member. It represents the calculated memory size in the former case and is unused in the latter. This moves it out to the main memblock structure instead Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04memblock/powerpc: Use new accessorsBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-04memblock: Rename memblock_region to memblock_type and memblock_property to ↵Benjamin Herrenschmidt
memblock_region Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14lmb: rename to memblockYinghai Lu
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-19powerpc: Fix swiotlb to respect the boot optionFUJITA Tomonori
powerpc initializes swiotlb before parsing the kernel boot options so swiotlb options (e.g. specifying the swiotlb buffer size) are ignored. Any time before freeing bootmem works for swiotlb so this patch moves powerpc's swiotlb initialization after parsing the kernel boot options, mem_init (as x86 does). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Becky Bruce <beckyb@kernel.crashing.org> Tested-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-20MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itselfRussell King
On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-10-30powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal ↵David Gibson
accessors The hugepage arch code provides a number of hook functions/macros which mirror the functionality of various normal page pte access functions. Various changes in the normal page accessors (in particular BenH's recent changes to the handling of lazy icache flushing and PAGE_EXEC) have caused the hugepage versions to get out of sync with the originals. In some cases, this is a bug, at least on some MMU types. One of the reasons that some hooks were not identical to the normal page versions, is that the fact we're dealing with a hugepage needed to be passed down do use the correct dcache-icache flush function. This patch makes the main flush_dcache_icache_page() function hugepage aware (by checking for the PageCompound flag). That in turn means we can make set_huge_pte_at() just a call to set_pte_at() bringing it back into sync. As a bonus, this lets us remove the hash_huge_page_do_lazy_icache() function, replacing it with a call to the hash_page_do_lazy_icache() function it was based on. Some other hugepage pte access hooks - huge_ptep_get_and_clear() and huge_ptep_clear_flush() - are not so easily unified, but this patch at least brings them back into sync with the current versions of the corresponding normal page functions. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-23walk system ram rangeKAMEZAWA Hiroyuki
Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22arches: drop superfluous casts in nr_free_pages() callersGeert Uytterhoeven
Commit 96177299416dbccb73b54e6b344260154a445375 ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-27powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency.Benjamin Herrenschmidt
The implementation we just revived has issues, such as using a Kconfig-defined virtual address area in kernel space that nothing actually carves out (and thus will overlap whatever is there), or having some dependencies on being self contained in a single PTE page which adds unnecessary constraints on the kernel virtual address space. This fixes it by using more classic PTE accessors and automatically locating the area for consistent memory, carving an appropriate hole in the kernel virtual address space, leaving only the size of that area as a Kconfig option. It also brings some dma-mask related fixes from the ARM implementation which was almost identical initially but grew its own fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27powerpc: Minor cleanups of kernel virt address space definitionsBenjamin Herrenschmidt
Make FIXADDR_TOP a compile time constant and cleanup a couple of definitions relative to the layout of the kernel address space on ppc32. We also print out that layout at boot time for debugging purposes. This is a pre-requisite for properly fixing non-coherent DMA allocactions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15powerpc: Allow mem=x cmdline to work with 4G+Becky Bruce
We're currently choking on mem=4g (and above) due to memory_limit being specified as an unsigned long. Make memory_limit phys_addr_t to fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-11powerpc/mm: Rework I$/D$ coherency (v3)Benjamin Herrenschmidt
This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-06mm: show node to memory section relationship with symlinks in sysfsGary Hade
Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-21powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDEDBenjamin Herrenschmidt
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21powerpc/mm: Add SMP support to no-hash TLB handlingBenjamin Herrenschmidt
This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-10-20mm: cleanup to make remove_memory() arch-neutralBadari Pulavarty
There is nothing architecture specific about remove_memory(). remove_memory() function is common for all architectures which support hotplug memory remove. Instead of duplicating it in every architecture, collapse them into arch neutral function. [akpm@linux-foundation.org: fix the export] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-07powerpc: Avoid integer overflow in page_is_ram()Roland Dreier
Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow for addresses above 4G on 32-bit kernels. However arch/powerpc's page_is_ram() is missing the same fix -- it computes a physical address by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page above 4G. In particular this causes pages above 4G to be mapped with the wrong caching attribute; for example many ppc440-based SoCs have PCI space above 4G, and mmap()ing MMIO space may end up with a mapping that has caching enabled. Fix this by working with the pfn and avoiding the conversion to physical address that causes the overflow. This patch compares the pfn to max_pfn, which is a semantic change from the old code -- that code compared the physical address to high_memory, which corresponds to max_low_pfn. However, I think that was is another bug, since highmem pages are still RAM. Reported-by: vb <vb@vsbe.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-08-04powerpc: Fix compiler warning in arch/powerpc/mm/mem.cTony Breeds
Explicitly cast to unsigned long long, rather than u64. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-26powerpc: use generic show_mem()Johannes Weiner
Remove arch-specific show_mem() in favor of the generic version. This also removes the following redundant information display: - pages in swapcache, printed by show_swap_cache_info() where show_mem() calls show_free_areas(), which calls show_swap_cache_info(). Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-09powerpc: Fix problems with 32bit PPC's running with >= 4GB of RAMStefan Roese
This patch enables 32bit PPC's (with 36bit physical address space, e.g. IBM/AMCC PPC44x) to run with >= 4GB of RAM. Mostly its just replacing types (unsigned long -> phys_addr_t). Tested on an AMCC Katmai with 4GB of DDR2. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2008-07-03powerpc: Fix building of arch/powerpc/mm/mem.o when MEMORY_HOTPLUG=y and ↵Tony Breeds
SPARSEMEM=n Currently the kernel fails to build with the above config options with: CC arch/powerpc/mm/mem.o arch/powerpc/mm/mem.c: In function 'arch_add_memory': arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping' This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and moves the guards in include/asm-powerpc/sparsemem.h to protect the SPARSEMEM specific portions only. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-06-09[POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=nNathan Lynch
The ehea driver was recently changed[1] to use walk_memory_resource() to detect the system's memory layout. However, walk_memory_resource() is available only when memory hotplug is enabled. So CONFIG_EHEA was made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a network driver to have such a dependency. Make the declaration of walk_memory_resource() and its powerpc implementation (ehea is powerpc-specific) unconditionally available. [1] 48cfb14f8b89d4d5b3df6c16f08b258686fb12ad "ehea: Add DLPAR memory remove support" [2] fb7b6ca2b6b7c23b52be143bdd5f55a23b9780c8 "ehea: Add dependency to Kconfig" Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29[POWERPC] Provide walk_memory_resource() for powerpcBadari Pulavarty
Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains logical memory region mapping in the lmb.memory structure. Walk through these structures and do the callbacks for the contiguous chunks. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-28hotplug-memory: make online_page() commonJeremy Fitzhardinge
All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-24[POWERPC] Port fixmap from x86 and use for kmap_atomicKumar Gala
The fixmap code from x86 allows us to have compile time virtual addresses that we change the physical addresses of at run time. This is useful for applications like kmap_atomic, PCI config that is done via direct memory map, kexec/kdump. We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal location for PKMAP_BASE based on where the fixmap addresses start and working back from there. Additionally, the kmap code in asm-powerpc/highmem.h always had debug enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should have the extra debug checking. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-24[POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero)Kumar Gala
Added support to allow an 85xx kernel to be run from a non-zero physical address (useful for cooperative asymmetric multiprocessing situations and kdump). The support can be configured at compile time by setting CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as desired. Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this config option causes the kernel to determine at runtime the physical addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning. However, CONFIG_PHYSICAL_START will always be used to set the LOAD program header physical address field in the resulting ELF image. Currently we are limited to running at a physical address that is a multiple of 256M. This is due to how we map TLBs to cover lowmem. This should be fixed to allow 64M or maybe even 16M alignment in the future. It is considered an error to try and run a kernel at a non-aligned physical address. All the magic for this support is accomplished by proper initialization of the kernel memory subsystem and use of ARCH_PFN_OFFSET. The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings. ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET. /dev/mem continues to allow access to any physical address in the system regardless of how CONFIG_PHYSICAL_START is set. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-17[POWERPC] Introduce lowmem_end_addr to distinguish from total_lowmemKumar Gala
total_lowmem represents the amount of low memory, not the physical address that low memory ends at. If the start of memory is at 0 it happens that total_lowmem can be used as both the size and the address that lowmem ends at (or more specifically one byte beyond the end). To make the code a bit more clear and deal with the case when the start of memory isn't at physical 0, we introduce lowmem_end_addr that represents one byte beyond the last physical address in the lowmem region. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01[POWERPC] Remove redundant display of free swap space in show_mem()Johannes Weiner
show_mem() has no need to print the amount of free swap space manually because show_free_areas() does this already and is called by the former. The two outputs only differ in text formatting: printk("Free swap = %lukB\n", ...); printk("Free swap: %6ldkB\n", ...); Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-01[POWERPC] arch_add_memory() cannot be __devinitGeert Uytterhoeven
WARNING: vmlinux.o(.text+0xb41b0): Section mismatch in reference from the function .add_memory() to the function .devinit.text:.arch_add_memory() The function .add_memory() references the function __devinit .arch_add_memory(). This is often because .add_memory lacks a __devinit annotation or the annotation of .arch_add_memory is wrong. arch_add_memory() is also not __devinit on other architectures Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-13[LIB]: Make PowerPC LMB code generic so sparc64 can use it too.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-08[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpcBadari Pulavarty
walk_memory_resource() verifies if there are holes in a given memory range, by checking against /proc/iomem. On x86/ia64 system memory is represented in /proc/iomem. On powerpc, we don't show system memory as IO resource in /proc/iomem - instead it's maintained in /proc/device-tree. This provides a way for an architecture to provide its own walk_memory_resource() function. On powerpc, the memory region is small (16MB), contiguous and non-overlapping. So extra checking against the device-tree is not needed. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-08[POWERPC] Add remove_memory() for 64-bit powerpcBadari Pulavarty
Supply remove_memory() function for 64-bit powerpc. This is still not quite complete as it needs to do some more arch-specific stuff, which will be added in a later patch. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-02-07Merge branch 'for-2.6.25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (69 commits) [POWERPC] Add SPE registers to core dumps [POWERPC] Use regset code for compat PTRACE_*REGS* calls [POWERPC] Use generic compat_sys_ptrace [POWERPC] Use generic compat_ptrace_request [POWERPC] Use generic ptrace peekdata/pokedata [POWERPC] Use regset code for PTRACE_*REGS* requests [POWERPC] Switch to generic compat_binfmt_elf code [POWERPC] Switch to using user_regset-based core dumps [POWERPC] Add user_regset compat support [POWERPC] Add user_regset_view definitions [POWERPC] Use user_regset accessors for GPRs [POWERPC] ptrace accessors for special regs MSR and TRAP [POWERPC] Use user_regset accessors for SPE regs [POWERPC] Use user_regset accessors for altivec regs [POWERPC] Use user_regset accessors for FP regs [POWERPC] mpc52xx: fix compile error introduce when rebasing patch [POWERPC] 4xx: PCIe indirect DCR spinlock fix. [POWERPC] Add missing native dcr dcr_ind_lock spinlock [POWERPC] 4xx: Fix offset value on Warp board [POWERPC] 4xx: Add 440EPx Sequoia ehci dts entry ...
2008-02-07Introduce flags for reserve_bootmem()Bernhard Walle
This patchset adds a flags variable to reserve_bootmem() and uses the BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions between crashkernel area and already used memory. This patch: Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE. If that flag is set, the function returns with -EBUSY if the memory already has been reserved in the past. This is to avoid conflicts. Because that code runs before SMP initialisation, there's no race condition inside reserve_bootmem_core(). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06[POWERPC] update_mmu_cache: Don't cache-flush non-readable pagesScott Wood
Currently, update_mmu_cache will crash if given a no-access PTE. There's no need to synchronize dcache/icache unless it's an exec mapping -- however, due to the existence of older glibc versions that execute out of a read-but-no-exec page, readability is tested instead. This assumes no exec-only mappings; if such mappings become supported, they will need to go through the kmap_atomic() version of dcache/icache synchronization. This fixes a bug reported by some users where the kernel would crash while dumping core on a threaded program. Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>