From 17968fbbd19f1bb281ee4eb2548764ac5664c4ec Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Sun, 27 May 2012 19:54:03 +0000 Subject: powerpc: 64bit optimised __clear_user I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/lib/string.S | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/powerpc/lib/string.S') diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index 093d6316435..1b5a0a09d60 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -119,6 +119,7 @@ _GLOBAL(memchr) 2: li r3,0 blr +#ifdef CONFIG_PPC32 _GLOBAL(__clear_user) addi r6,r3,-4 li r3,0 @@ -160,3 +161,4 @@ _GLOBAL(__clear_user) PPC_LONG 1b,91b PPC_LONG 8b,92b .text +#endif -- cgit v1.2.3-70-g09d2