summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/lib
AgeCommit message (Collapse)Author
2010-07-08powerpc: Fix feature-fixup tests for gcc 4.5Stephen Rothwell
The feature-fixup test declare some extern void variables and then take their addresses. Fix this by declaring them as extern u8 instead. Fixes these warnings (treated as errors): CC arch/powerpc/lib/feature-fixups.o cc1: warnings being treated as errors arch/powerpc/lib/feature-fixups.c: In function 'test_cpu_macros': arch/powerpc/lib/feature-fixups.c:293:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:294:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:297:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:297:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c: In function 'test_fw_macros': arch/powerpc/lib/feature-fixups.c:306:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:307:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:310:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:310:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c: In function 'test_lwsync_macros': arch/powerpc/lib/feature-fixups.c:321:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:322:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:326:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:326:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:329:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:329:3: error: taking address of expression of type 'void' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08powerpc: Fix module building for gcc 4.5 and 64 bitStephen Rothwell
Gcc 4.5 is now generating out of line register save and restore in the function prefix and postfix when we use -Os. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21powerpc: Fix string library functionsAndreas Schwab
The powerpc strncmp implementation does not correctly handle a zero length, despite the claim in 0119536cd314ef95553604208c25bc35581f7f0a (Add hand-coded assembly strcmp). Additionally, all the length arguments are size_t, not int, so use PPC_LCMPI and eq instead of cmpwi and le throughout. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-04-07powerpc: Fix handling of strncmp with zero lenJeff Mahoney
Commit 0119536c, which added the assembly version of strncmp to powerpc, mentions that it adds two instructions to the version from boot/string.S to allow it to handle len=0. Unfortunately, it doesn't always return 0 when that is the case. The length is passed in r5, but the return value is passed back in r3. In certain cases, this will happen to work. Otherwise it will pass back the address of the first string as the return value. This patch lifts the len <= 0 handling code from memcpy to handle that case. Reported by: Christian_Sellars@symantec.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> CC: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-26powerpc: Fix lwsync feature fixup vs. modules on 64-bitBenjamin Herrenschmidt
Anton's commit enabling the use of the lwsync fixup mechanism on 64-bit breaks modules. The lwsync fixup section uses .long instead of the FTR_ENTRY_OFFSET macro used by other fixups sections, and thus will generate 32-bit relocations that our module loader cannot resolve. This changes it to use the same type as other feature sections. Note however that we might want to consider using 32-bit for all the feature fixup offsets and add support for R_PPC_REL32 to module_64.c instead as that would reduce the size of the kernel image. I'll leave that as an exercise for the reader for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Improve 64bit copy_tofrom_userAnton Blanchard
Here is a patch from Paul Mackerras that improves the ppc64 copy_tofrom_user. The loop now does 32 bytes at a time and as well as pairing loads and stores. A quick test case that reads 8kB over and over shows the improvement: POWER6: 53% faster POWER7: 51% faster #define _XOPEN_SOURCE 500 #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #define BUFSIZE (8 * 1024) #define ITERATIONS 10000000 int main() { char tmpfile[] = "/tmp/copy_to_user_testXXXXXX"; int fd; char *buf[BUFSIZE]; unsigned long i; fd = mkstemp(tmpfile); if (fd < 0) { perror("open"); exit(1); } if (write(fd, buf, BUFSIZE) != BUFSIZE) { perror("open"); exit(1); } for (i = 0; i < 10000000; i++) { if (pread(fd, buf, BUFSIZE, 0) != BUFSIZE) { perror("pread"); exit(1); } } unlink(tmpfile); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Pair loads and stores in copy_4k_pageAnton Blanchard
A number of our chips like loads and stores to be paired. A small kernel module testcase shows the improvement of pairing loads and stores in copy_4k_page: POWER6: +9% POWER7: +1.5% #include <linux/module.h> #include <linux/mm.h> #define ITERATIONS 10000000 static int __init copypage_init(void) { struct timespec before, after; unsigned long i; struct page *destpage, *srcpage; char *dest, *src; destpage = alloc_page(GFP_KERNEL); srcpage = alloc_page(GFP_KERNEL); dest = page_address(destpage); src = page_address(srcpage); getnstimeofday(&before); for (i = 0; i < ITERATIONS; i++) copy_4K_page(dest, src); getnstimeofday(&after); free_page((unsigned long)dest); free_page((unsigned long)src); printk(KERN_DEBUG "copy_4K_page loop took %lu ns\n", (after.tv_sec - before.tv_sec) * NSEC_PER_SEC + (after.tv_nsec - before.tv_nsec)); return 0; } static void __exit copypage_exit(void) { } module_init(copypage_init) module_exit(copypage_exit) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Anton Blanchard"); Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc: Fix lwsync patching code on 64bitAnton Blanchard
do_lwsync_fixups doesn't work on 64bit, we end up writing lwsyncs to the wrong addresses: 0:mon> di c0000001000bfacc c0000001000bfacc 7c2004ac lwsync Since the lwsync section has negative offsets we need to use a signed int pointer so we sign extend the value. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-14locking: Convert raw_rwlock to arch_rwlockThomas Gleixner
Not strictly necessary for -rt as -rt does not have non sleeping rwlocks, but it's odd to not have a consistent naming convention. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
2009-12-14locking: Convert __raw_spin* functions to arch_spin*Thomas Gleixner
Name space cleanup. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
2009-12-14locking: Convert raw_spinlock to arch_spinlockThomas Gleixner
The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt. Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
2009-12-09Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Conflicts: include/linux/kvm.h
2009-12-09powerpc/8xx: Start using dcbX instructions in various copy routinesJoakim Tjernlund
Now that 8xx can fixup dcbX instructions, start using them where possible like every other PowerPc arch do. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-28powerpc: perf_event: Cleanup copy_page output by hiding setup symbolAnton Blanchard
A lot of hits in "setup" doesn't make much sense, so hide this symbol and allow all the hits to end up in copy_4k_page. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-06-16powerpc: Add configurable -Werror for arch/powerpcMichael Ellerman
Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mmBenjamin Herrenschmidt
(pre-requisite to make the next patches more palatable) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-27Revert "powerpc: Rework dma-noncoherent to use generic vmalloc layer"Benjamin Herrenschmidt
This reverts commit 33f00dcedb0e22cdb156a23632814fc580fcfcf8. While it was a good idea to try to use the mm/vmalloc.c allocator instead of our own (in fact, ours is itself a dup on an old variant of the vmalloc one), unfortunately, the approach is terminally busted since dma_alloc_coherent() can be called at interrupt time or in atomic contexts and there's little chances we'll make the code in mm/vmalloc.c cope with\ that :-( Until we can get the generic code to forbid that idiocy and fix all drivers abusing it, we pretty much have no choice but revert to our custom virtual space allocator. There's also a problem with SMP safety since freeing such mapping would require an IPI which cannot be done at interrupt time. However, right now, I don't think we support any platform that is both SMP and has non-coherent DMA (don't laugh, I know such things do exist !) so we can sort that out later. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-11Merge commit 'origin/master' into nextBenjamin Herrenschmidt
2009-02-26powerpc: Fix 64bit __copy_tofrom_user() regressionMark Nelson
This fixes a regression introduced by commit a4e22f02f5b6518c1484faea1f88d81802b9feac ("powerpc: Update 64bit __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD"). The same bug that existed in the 64bit memcpy() also exists here so fix it here too. The fix is the same as that applied to memcpy() with the addition of fixes for the exception handling code required for __copy_tofrom_user(). This stops us reading beyond the end of the source region we were told to copy. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26powerpc: Fix 64bit memcpy() regressionMark Nelson
This fixes a regression introduced by commit 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD"). This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU feature bit present to do the memcpy() with unaligned load doubles. But, along with this came a bug where our final load double would read bytes beyond a page boundary and into the next (unmapped) page. This was caught by enabling CONFIG_DEBUG_PAGEALLOC, The fix was to read only the number of bytes that we need to store rather than reading a full 8-byte doubleword and storing only a portion of that. In order to minimise the amount of existing code touched we use the original do_tail for the src_unaligned case. Below is an example of the regression, as reported by Sachin Sant: Unable to handle kernel paging request for data at address 0xc00000003f380000 Faulting instruction address: 0xc000000000039574 cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020] pc: c000000000039574: .memcpy+0x74/0x244 lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3] sp: c00000003baf32a0 msr: 8000000000009032 dar: c00000003f380000 dsisr: 40000000 current = 0xc00000003e54b010 paca = 0xc000000000a53680 pid = 1840, comm = readahead enter ? for help [link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3] [c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3] (unreliab le) [c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3] [c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c [c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678 [c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68 [c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c [c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120 [c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3] [c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260 [c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010 [c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4 [c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4 [c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8 [c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874 [c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140 [c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38 [c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23powerpc: Rework dma-noncoherent to use generic vmalloc layerIlya Yanok
This patch rewrites consistent dma allocations support to use vmalloc layer to allocate virtual memory space from vmalloc pool and get rid of CONFIG_CONSISTENT_{START,SIZE}. This greatly simplifies the code by effectively removing a custom allocator we had for virtual space. Signed-off-by: Ilya Yanok <yanok@emcraft.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-23powerpc: Unify opcode definitions and supportKumar Gala
Create a new header that becomes a single location for defining PowerPC opcodes used by code that is either generationg instructions at runtime (fixups, debug, etc.), emulating instructions, or just compiling instructions old assemblers don't know about. We currently don't handle the floating point emulation or alignment decode as both are better handled by the specific decode support they already have. Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions since older assemblers don't know about them. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-10powerpc: Don't emulate mr. instructionsAnanth N Mavinakayanahalli
Currently emulate_step() emulates mr. instructions without updating cr0 and this can be disastrous. Don't emulate mr. This bug has been around for a while, but I am not sure if its a worthy -stable candidate. I'll leave it to Ben do decide. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-12-28Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-21powerpc: Rename struct vm_region to avoid conflict with NOMMUDavid Howells
Rename PowerPC's struct vm_region so that I can introduce my own global version for NOMMU. It's feasible that the PowerPC version may wish to use my global one instead. The NOMMU vm_region struct defines areas of the physical memory map that are under mmap. This may include chunks of RAM or regions of memory mapped devices, such as flash. It is also used to retain copies of file content so that shareable private memory mappings of files can be made. As such, it may be compatible with what is described in the banner comment for PowerPC's vm_region struct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-19Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' ↵Ingo Molnar
into tracing/core Conflicts: include/linux/ftrace.h
2008-12-18Merge branch 'linux-2.6' into nextPaul Mackerras
2008-12-17powerpc: Fix corruption error in rh_alloc_fixed()Guillaume Knispel
There is an error in rh_alloc_fixed() of the Remote Heap code: If there is at least one free block blk won't be NULL at the end of the search loop, so -ENOMEM won't be returned and the else branch of "if (bs == s || be == e)" will be taken, corrupting the management structures. Signed-off-by: Guillaume Knispel <gknispel@proformatique.com> Acked-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-11-28powerpc/ppc32: static ftrace fixes for PPC32Steven Rostedt
Impact: fix for PowerPC 32 code There were some early init code that was not safe for static ftrace to boot on my PowerBook. This code must only use relative addressing, and static mcount performs a compare of the ftrace_trace_function pointer, and gets that with an absolute address. In the early init boot up code, this will cause a fault. This patch removes tracing from the files containing the offending functions. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-19powerpc: Update 64bit __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STDMark Nelson
In exactly the same way that we updated memcpy() with new feature sections in commit 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD"), we do the same thing here for __copy_tofrom_user(). Once again this is purely a performance tweak for Cell and Power6 - this has no effect on all the other 64bit powerpc chips. We can make these same changes to __copy_tofrom_user() because the basic copy algorithm is the same as in memcpy() - this version just has all the exception handling logic needed when copying to or from userspace as well as a special case for copying whole 4K pages that are page aligned. CPU_FTR_UNALIGNED_LD_STD CPU was added in commit 4ec577a28980a0790df3c3dfe9c81f6e2222acfb ("powerpc: Add new CPU feature: CPU_FTR_UNALIGNED_LD_STD"). We also make the same simple one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-19powerpc: Remove superfluous WARN_ON() from dma-noncoherent.cHollis Blanchard
I can't tell why this WARN_ON exists, and there's no comment explaining it. Whether the pmd is present or not, pte_alloc_kernel() seems to handle both cases. Booting a 440 kernel with 64K PAGE_SIZE triggers the warning, but boot successfully completes and I see no problems beyond that. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STDMark Nelson
Update memcpy() to add two new feature sections: one for aligning the destination before copying and one for copying using aligned load and store doubles. These new feature sections will only affect Power6 and Cell because the CPU feature bit was only added to these two processors. Power6 gets its best performance in memcpy() when aligning neither the source nor the destination, while Cell gets its best performance when just the destination is aligned. But in order to save on CPU feature bits we can use the previously added CPU_FTR_CP_USE_DCBTZ feature bit to differentiate between Power6 and Cell (because CPU_FTR_CP_USE_DCBTZ was added to Cell but not Power6). The first feature section acts to nop out the branch that takes us to the code that aligns us to an eight byte boundary for the destination. We only want to nop out this branch on Power6. So the ALT_FTR_SECTION_END() for this feature section creates a test mask of the two feature bits ORed together and provides an expected result of just CPU_FTR_UNALIGNED_LD_STD, thus we nop out the branch if we're on a CPU that has CPU_FTR_UNALIGNED_LD_STD set and CPU_FTR_CP_USE_DCBTZ unset. For the second feature section added, if we're on a CPU that has the CPU_FTR_UNALIGNED_LD_STD bit set then we don't want to do the copy with aligned loads and stores (and the appropriate shifting left and right instructions), so we want to nop out the branch to .Lsrc_unaligned. The andi. used for this branch is moved to just above the branch because this allows us to nop out both instructions with just one feature section which gives us better performance and doesn't hurt readability which two separate feature sections did. Moving the andi. to just above the branch doesn't have any noticeable negative effect on the remaining 64bit processors (the ones that didn't have this feature bit added). On Cell this simple modification results in an improvement to measured memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in the cold cache case. On Power6 we get memory bandwidth results that are up to three times faster in the hot cache case and up to 50% faster in the cold cache case. Commit 2a9294369bd020db89bfdf78b84c3615b39a5c84 ("powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was added. To say that Cell gets its best performance in memcpy() with just the destination aligned is true but only for the reason that the indirect shift and rotate instructions, sld and srd, are microcoded on Cell. This means that either the destination or the source can be aligned, but not both, and seeing as we get better performance with the destination aligned we choose this option. While we're at it make a one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-10-14powerpc: Fix DMA offset for non-coherent DMABenjamin Herrenschmidt
After Becky's work we can almost have different DMA offsets between on-chip devices and PCI. Almost because there's a problem with the non-coherent DMA code that basically ignores the programmed offset to use the global one for everything. This fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-09-15powerpc: New copy_4K_page()Mark Nelson
This new copy_4K_page() function was originally tuned for the best performance on the Cell processor, but after testing on more 64bit powerpc chips it was found that with a small modification it either matched the performance offered by the current mainline version or bettered it by a small amount. It was found that on a Cell-based QS22 blade the amount of system time measured when compiling a 2.6.26 pseries_defconfig decreased by 4%. Using the same test, a 4-way 970MP machine saw a decrease of 2% in system time. No noticeable change was seen on Power4, Power5 or Power6. The 4096 byte page is copied in thirty-two 128 byte strides. An initial setup loop executes dcbt instructions for the whole source page and dcbz instructions for the whole destination page. To do this, the cache line size is retrieved from ppc64_caches. A new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, (introduced in the previous patch) is used to make the modification to this new copy routine - on Power4, 970 and Cell the feature bit is set so the setup loop is executed, but on all other 64bit chips the setup loop is nop'ed out. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-08-04powerpc: Remove use of CONFIG_PPC_MERGEKumar Gala
Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc and include/asm-powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-24PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architecturesAndrea Righi
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit boundary. For example: u64 val = PAGE_ALIGN(size); always returns a value < 4GB even if size is greater than 4GB. The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for example): #define PAGE_SHIFT 12 #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) ... #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary. Using the ALIGN() macro seems to be the right way, because it uses typeof(addr) for the mask. Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in include/linux/mm.h. See also lkml discussion: http://lkml.org/lkml/2008/6/11/237 [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c] [akpm@linux-foundation.org: fix v850] [akpm@linux-foundation.org: fix powerpc] [akpm@linux-foundation.org: fix arm] [akpm@linux-foundation.org: fix mips] [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c] [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c] [akpm@linux-foundation.org: fix powerpc] Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.SMichael Ellerman
Replace ifdef clutter with the PPC_LONG and PPC_LONG_ALIGN macros for readability. No change to the generated code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22powerpc: Use WARN_ON(1) instead of __WARN()Michael Ellerman
__WARN() is not defined for all configs, use WARN_ON(1) instead. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-03powerpc: Fixup lwsync at runtimeKumar Gala
To allow for a single kernel image on e500 v1/v2/mc we need to fixup lwsync at runtime. On e500v1/v2 lwsync causes an illop so we need to patch up the code. We default to 'sync' since that is always safe and if the cpu is capable we will replace 'sync' with 'lwsync'. We introduce CPU_FTR_LWSYNC as a way to determine at runtime if this is needed. This flag could be moved elsewhere since we dont really use it for the normal CPU_FTR purpose. Finally we only store the relative offset in the fixup section to keep it as small as possible rather than using a full fixup_entry. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-03powerpc: Fix building of feature-fixup tests on ppc32Kumar Gala
We need to use PPC_LCMPI otherwise we get compile errors like: arch/powerpc/lib/feature-fixups-test.S: Assembler messages: arch/powerpc/lib/feature-fixups-test.S:142: Error: Unrecognized opcode: `cmpdi' arch/powerpc/lib/feature-fixups-test.S:149: Error: Unrecognized opcode: `cmpdi' arch/powerpc/lib/feature-fixups-test.S:164: Error: Unrecognized opcode: `cmpdi' Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Prevent memory corruption due to cache invalidation of unaligned ↵Andrew Lewis
DMA buffer On PowerPC processors with non-coherent cache architectures the DMA subsystem calls invalidate_dcache_range() before performing a DMA read operation. If the address and length of the DMA buffer are not aligned to a cache-line boundary this can result in memory outside of the DMA buffer being invalidated in the cache. If this memory has an uncommitted store then the data will be lost and a subsequent read of that address will result in an old value being returned from main memory. Only when the DMA buffer starts on a cache-line boundary and is an exact mutiple of the cache-line size can invalidate_dcache_range() be called, otherwise flush_dcache_range() must be called. flush_dcache_range() will first flush uncommitted writes, and then invalidate the cache. Signed-off-by: Andrew Lewis <andrew-lewis at netspace.net.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Add self-tests of the feature fixup codeMichael Ellerman
This commit adds tests of the feature fixup code, they are run during boot if CONFIG_FTR_FIXUP_SELFTEST=y. Some of the tests manually invoke the patching routines to check their behaviour, and others use the macros and so are patched during the normal patching done during boot. Because we have two sets of macros with different names, we use a macro to generate the test of the macros, very niiiice. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Add logic to patch alternative feature sectionsMichael Ellerman
This commit adds the logic to patch alternative sections. This is fairly straightforward, except for branches. Relative branches that jump from inside the else section to outside of it need to be translated as they're moved, otherwise they will jump to the wrong location. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Introduce infrastructure for feature sections with alternativesMichael Ellerman
The current feature section logic only supports nop'ing out code, this means if you want to choose at runtime between instruction sequences, one or both cases will have to execute the nop'ed out contents of the other section, eg: BEGIN_FTR_SECTION or 1,1,1 END_FTR_SECTION_IFSET(FOO) BEGIN_FTR_SECTION or 2,2,2 END_FTR_SECTION_IFCLR(FOO) and the resulting code will be either, or 1,1,1 nop or, nop or 2,2,2 For small code segments this is fine, but for larger code blocks and in performance criticial code segments, it would be nice to avoid the nops. This commit starts to implement logic to allow the following: BEGIN_FTR_SECTION or 1,1,1 FTR_SECTION_ELSE or 2,2,2 ALT_FTR_SECTION_END_IFSET(FOO) and the resulting code will be: or 1,1,1 or, or 2,2,2 We achieve this by extending the existing FTR macros. The current feature section semantic just becomes a special case, ie. if the else case is empty we nop out the default case. The key limitation is that the size of the else case must be less than or equal to the size of the default case. If the else case is smaller the remainder of the section is nop'ed. We let the linker put the else case code in with the rest of the text, so that relative branches from the else case are more likley to link, this has the disadvantage that we can't free the unused else cases. This commit introduces the required macro and linker script changes, but does not enable the patching of the alternative sections. We also need to update two hand-made section entries in reg.h and timex.h Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Split out do_feature_fixups() from cputable.cMichael Ellerman
The logic to patch CPU feature sections lives in cputable.c, but these days it's used for CPU features as well as firmware features. Move it into it's own file for neatness and as preparation for some additions. While we're moving the code, we pull the loop body logic into a separate routine, and remove a comment which doesn't apply anymore. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Add tests of the code patching routinesMichael Ellerman
Add tests of the existing code patching routines, as well as the new routines added in the last commit. The self-tests are run late in boot when CONFIG_CODE_PATCHING_SELFTEST=y, which depends on DEBUG_KERNEL=y. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Add new code patching routinesMichael Ellerman
This commit adds some new routines for patching code, which will be used in a following commit. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Make create_branch() return errors if the branch target is too largeMichael Ellerman
If you pass a target value to create_branch() which is more than 32MB - 4, or - 32MB away from the branch site, then it's impossible to create an immediate branch. The current code doesn't check, which will lead to us creating a branch to somewhere else - which is bad. For code that cares to check we return 0, which is easy to check for, and for code that doesn't at least we'll be creating an illegal instruction, rather than a branch to some random address. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-07-01powerpc: Allow create_branch() to return errorsMichael Ellerman
Currently create_branch() creates a branch instruction for you, and patches it into the call site. In some circumstances it would be nice to be able to create the instruction and patch it later, and also some code might want to check for errors in the branch creation before doing the patching. A future commit will change create_branch() to check for errors. For callers that don't care, replace create_branch() with patch_branch(), which just creates the branch and patches it directly. While we're touching all the callers, change to using unsigned int *, as this seems to match usage better. That allows (and requires) us to remove the volatile in the definition of vector in powermac/smp.c and mpc86xx_smp.c, that's correct because now that we're passing vector as an unsigned int * the compiler knows that it's value might change across the patch_branch() call. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Jon Loeliger <jdl@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>