summaryrefslogtreecommitdiffstats
path: root/arch/x86/lib
AgeCommit message (Collapse)Author
2012-03-22Merge branch 'x86-atomic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/atomic changes from Ingo Molnar. * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: atomic64 assembly improvements x86: Adjust asm constraints in atomic64 wrappers
2012-03-22Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Include probe_roms.h in probe_roms.c x86/32: Print control and debug registers for kerenel context x86: Tighten dependencies of CPU_SUP_*_32 x86/numa: Improve internode cache alignment x86: Fix the NMI nesting comments x86-64: Improve insn scheduling in SAVE_ARGS_IRQ x86-64: Fix CFI annotations for NMI nesting code bitops: Add missing parentheses to new get_order macro bitops: Optimise get_order() bitops: Adjust the comment on get_order() to describe the size==0 case x86/spinlocks: Eliminate TICKET_MASK x86-64: Handle byte-wise tail copying in memcpy() without a loop x86-64: Fix memcpy() to support sizes of 4Gb and above x86-64: Fix memset() to support sizes of 4Gb and above x86-64: Slightly shorten copy_page()
2012-03-21Merge branch 'kmap_atomic' of git://github.com/congwang/linuxLinus Torvalds
Pull kmap_atomic cleanup from Cong Wang. It's been in -next for a long time, and it gets rid of the (no longer used) second argument to k[un]map_atomic(). Fix up a few trivial conflicts in various drivers, and do an "evil merge" to catch some new uses that have come in since Cong's tree. * 'kmap_atomic' of git://github.com/congwang/linux: (59 commits) feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] drbd: remove the second argument of k[un]map_atomic() zcache: remove the second argument of k[un]map_atomic() gma500: remove the second argument of k[un]map_atomic() dm: remove the second argument of k[un]map_atomic() tomoyo: remove the second argument of k[un]map_atomic() sunrpc: remove the second argument of k[un]map_atomic() rds: remove the second argument of k[un]map_atomic() net: remove the second argument of k[un]map_atomic() mm: remove the second argument of k[un]map_atomic() lib: remove the second argument of k[un]map_atomic() power: remove the second argument of k[un]map_atomic() kdb: remove the second argument of k[un]map_atomic() udf: remove the second argument of k[un]map_atomic() ubifs: remove the second argument of k[un]map_atomic() squashfs: remove the second argument of k[un]map_atomic() reiserfs: remove the second argument of k[un]map_atomic() ocfs2: remove the second argument of k[un]map_atomic() ntfs: remove the second argument of k[un]map_atomic() ...
2012-03-20x86: remove the second argument of k[un]map_atomic()Cong Wang
Acked-by: Avi Kivity <avi@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-12Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: We are going to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09x86: Derandom delay_tsc for 64 bitThomas Gleixner
Commit f0fbf0abc093 ("x86: integrate delay functions") converted delay_tsc() into a random delay generator for 64 bit. The reason is that it merged the mostly identical versions of delay_32.c and delay_64.c. Though the subtle difference of the result was: static void delay_tsc(unsigned long loops) { - unsigned bclock, now; + unsigned long bclock, now; Now the function uses rdtscl() which returns the lower 32bit of the TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64 bit this fails when the lower 32bit are close to wrap around when bclock is read, because the following check if ((now - bclock) >= loops) break; evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0 because the unsigned long (now - bclock) of these values results in 0xffffffff00000001 which is definitely larger than the loops value. That explains Tvortkos observation: "Because I am seeing udelay(500) (_occasionally_) being short, and that by delaying for some duration between 0us (yep) and 491us." Make those variables explicitely u32 again, so this works for both 32 and 64 bit. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # >= 2.6.27 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-28Merge branch 'linus' into x86/asmIngo Molnar
Sync up the latest NMI fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-11x86: Fix to decode grouped AVX with VEX pp bitsMasami Hiramatsu
Fix to decode grouped AVX with VEX pp bits which should be handled as same as last-prefixes. This fixes below warnings in posttest with CONFIG_CRYPTO_SHA1_SSSE3=y. Warning: arch/x86/tools/test_get_len found difference at <sha1_transform_avx>:ffffffff810d5fc0 Warning: ffffffff810d6069: c5 f9 73 de 04 vpsrldq $0x4,%xmm6,%xmm0 Warning: objdump says 5 bytes, but insn_get_length() says 4 ... With this change, test_get_len can decode it correctly. $ arch/x86/tools/test_get_len -v -y ffffffff810d6069: c5 f9 73 de 04 vpsrldq $0x4,%xmm6,%xmm0 Succeed: decoded and checked 1 instructions Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20120210053340.30429.73410.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26x86-64: Handle byte-wise tail copying in memcpy() without a loopJan Beulich
While hard to measure, reducing the number of possibly/likely mis-predicted branches can generally be expected to be slightly better. Other than apparent at the first glance, this also doesn't grow the function size (the alignment gap to the next function just gets smaller). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4F218584020000780006F422@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26x86-64: Fix memcpy() to support sizes of 4Gb and aboveJan Beulich
While currently there doesn't appear to be any reachable in-tree case where such large memory blocks may be passed to memcpy(), we already had hit the problem in our Xen kernels. Just like done recently for mmeset(), rather than working around it, prevent others from falling into the same trap by fixing this long standing limitation. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4F21846F020000780006F3FA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26x86-64: Fix memset() to support sizes of 4Gb and aboveJan Beulich
While currently there doesn't appear to be any reachable in-tree case where such large memory blocks may be passed to memset() (alloc_bootmem() being the primary non-reachable one, as it gets called with suitably large sizes in FLATMEM configurations), we have recently hit the problem a second time in our Xen kernels. Rather than working around it a second time, prevent others from falling into the same trap by fixing this long standing limitation. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F05D992020000780006AA09@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-20x86: atomic64 assembly improvementsJan Beulich
In the "xchg" implementation, %ebx and %ecx don't need to be copied into %eax and %edx respectively (this is only necessary when desiring to only read the stored value). In the "add_unless" implementation, swapping the use of %ecx and %esi for passing arguments allows %esi to become an input only (i.e. permitting the register to be re-used to address the same object without reload). In "{add,sub}_return", doing the initial read64 through the passed in %ecx decreases a register dependency. In "inc_not_zero", a branch can be eliminated by or-ing together the two halves of the current (64-bit) value, and code size can be further reduced by adjusting the arithmetic slightly. v2: Undo the folding of "xchg" and "set". Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4F19A2BC020000780006E0DC@nat28.tlf.novell.com Cc: Luca Barbieri <luca@luca-barbieri.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-01-20x86: Adjust asm constraints in atomic64 wrappersJan Beulich
Eric pointed out overly restrictive constraints in atomic64_set(), but there are issues throughout the file. In the cited case, %ebx and %ecx are inputs only (don't get changed by either of the two low level implementations). This was also the case elsewhere. Further in many cases early-clobber indicators were missing. Finally, the previous implementation rolled a custom alternative instruction macro from scratch, rather than using alternative_call() (which was introduced with the commit that the description of the change in question actually refers to). Adjusting has the benefit of not hiding referenced symbols from the compiler, which however requires them to be declared not just in the exporting source file (which, as a desirable side effect, in turn allows that exporting file to become a real 5-line stub). This patch does not eliminate the overly restrictive memory clobbers, however: Doing so would occasionally make the compiler set up a second register for accessing the memory object (to satisfy the added "m" constraint), and it's not clear which of the two non-optimal alternatives is better. v2: Re-do the declaration and exporting of the internal symbols. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4F19A2A5020000780006E0D9@nat28.tlf.novell.com Cc: Luca Barbieri <luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-01-19Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/accounting, proc: Fix /proc/stat interrupts sum * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore x86/kprobes: Fix typo transferred from Intel manual * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits x86, tsc: Fix SMI induced variation in quick_pit_calibrate() x86, opcode: ANDN and Group 17 in x86-opcode-map.txt x86/kconfig: Move the ZONE_DMA entry under a menu x86/UV2: Add accounting for BAU strong nacks x86/UV2: Ack BAU interrupt earlier x86/UV2: Remove stale no-resources test for UV2 BAU x86/UV2: Work around BAU bug x86/UV2: Fix BAU destination timeout initialization x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode x86: Get rid of dubious one-bit signed bitfield
2012-01-17x86, opcode: ANDN and Group 17 in x86-opcode-map.txtUlrich Drepper
The Intel documentation at http://software.intel.com/file/36945 shows the ANDN opcode and Group 17 with encoding f2 and f3 encoding respectively. The current version of x86-opcode-map.txt shows them with f3 and f4. Unless someone can point to documentation which shows the currently used encoding the following patch be applied. Signed-off-by: Ulrich Drepper <drepper@gmail.com> Link: http://lkml.kernel.org/r/CAOPLpQdq5SuVo9=023CYhbFLAX9rONyjmYq7jJkqc5xwctW5eA@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-01-16x86/kprobes: Fix typo transferred from Intel manualUlrich Drepper
The arch/x86/lib/x86-opcode-map.txt file [used by the kprobes instruction decoder] contains the line: af: SCAS/W/D/Q rAX,Xv This is what the Intel manuals show, but it's not correct. The 'X' stands for: Memory addressed by the DS:rSI register pair (for example, MOVS, CMPS, OUTS, or LODS). On the other hand 'Y' means (also see the ae byte entry for SCASB): Memory addressed by the ES:rDI register pair (for example, MOVS, CMPS, INS, STOS, or SCAS). Signed-off-by: Ulrich Drepper <drepper@gmail.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/CAOPLpQfytPyDEBF1Hbkpo7ovUerEsstVGxBr%3DEpDL-BKEMaqLA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-06Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86: Fix atomic64_xxx_cx8() functions x86: Fix and improve cmpxchg_double{,_local}() x86_64, asm: Optimise fls(), ffs() and fls64() x86, bitops: Move fls64.h inside __KERNEL__ x86: Fix and improve percpu_cmpxchg{8,16}b_double() x86: Report cpb and eff_freq_ro flags correctly x86/i386: Use less assembly in strlen(), speed things up a bit x86: Use the same node_distance for 32 and 64-bit x86: Fix rflags in FAKE_STACK_FRAME x86: Clean up and extend do_int3() x86: Call do_notify_resume() with interrupts enabled x86/div64: Add a micro-optimization shortcut if base is power of two x86-64: Cleanup some assembly entry points x86-64: Slightly shorten line system call entry and exit paths x86-64: Reduce amount of redundant code generated for invalidate_interruptNN x86-64: Slightly shorten int_ret_from_sys_call x86, efi: Convert efi_phys_get_time() args to physical addresses x86: Default to vsyscall=emulate x86-64: Set siginfo and context on vsyscall emulation faults x86: consolidate xchg and xadd macros ...
2012-01-06x86-64: Slightly shorten copy_page()Jan Beulich
%r13 got saved and restored without ever getting touched, so there's no need to do so. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F05D9F9020000780006AA0D@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-12x86/i386: Use less assembly in strlen(), speed things up a bitAlexey Dobriyan
Current i386 strlen() hardcodes NOT/DEC sequence. DEC is mentioned to be suboptimal on Core2. So, put only REPNE SCASB sequence in assembly, compiler can do the rest. The difference in generated code is like below (MCORE2=y): <strlen>: push %edi mov $0xffffffff,%ecx mov %eax,%edi xor %eax,%eax repnz scas %es:(%edi),%al not %ecx - dec %ecx - mov %ecx,%eax + lea -0x1(%ecx),%eax pop %edi ret Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/20111211181319.GA17097@p183.telecom.by Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Update instruction decoder to support new AVX formatsMasami Hiramatsu
Since new Intel software developers manual introduces new format for AVX instruction set (including AVX2), it is important to update x86-opcode-map.txt to fit those changes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05x86: Fix instruction decoder to handle grouped AVX instructionsMasami Hiramatsu
For reducing memory usage of attribute table, x86 instruction decoder puts "Group" attribute only on "no-last-prefix" attribute table (same as vex_p == 0 case). Thus, the decoder should look no-last-prefix table first, and then only if it is not a group, move on to "with-last-prefix" table (vex_p != 0). However, current implementation, inat_get_avx_attribute() looks with-last-prefix directly. So, when decoding a grouped AVX instruction, the decoder fails to find correct group because there is no "Group" attribute on the table. This ends up with the mis-decoding of instructions, as Ingo reported in http://thread.gmane.org/gmane.linux.kernel/1214103 This patch fixes it to check no-last-prefix table first even if that is an AVX instruction, and get an attribute from "with last-prefix" table only if that is not a group. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-10x86: Fix insn decoder for longer instructionMasami Hiramatsu
Fix x86 insn decoder for hardening against invalid length instructions. This adds length checkings for each byte-read site and if it exceeds MAX_INSN_SIZE, returns immediately. This can happen when decoding user-space binary. Caller can check whether it happened by checking insn.*.got member is set or not. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: acme@redhat.com Cc: ming.m.lin@intel.com Cc: robert.richter@amd.com Cc: ravitillo@lbl.gov Cc: yrl.pp-manager.tt@hitachi.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20111007133155.10933.58577.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-22Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, vdso: Do not allocate memory for the vDSO clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option x86, vdso: Drop now wrong comment Document the vDSO and add a reference parser ia64: Replace clocksource.fsys_mmio with generic arch data x86-64: Move vread_tsc and vread_hpet into the vDSO clocksource: Replace vread with generic arch data x86-64: Add --no-undefined to vDSO build x86-64: Allow alternative patching in the vDSO x86: Make alternative instruction pointers relative x86-64: Improve vsyscall emulation CS and RIP handling x86-64: Emulate legacy vsyscalls x86-64: Fill unused parts of the vsyscall page with 0xcc x86-64: Remove vsyscall number 3 (venosys) x86-64: Map the HPET NX x86-64: Remove kernel.vsyscall64 sysctl x86-64: Give vvars their own page x86-64: Document some of entry_64.S x86-64: Fix alignment of jiffies variable
2011-07-22Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix write lock scalability 64-bit issue x86: Unify rwsem assembly implementation x86: Unify rwlock assembly implementation x86, asm: Fix binutils 2.16 issue with __USER32_CS x86, asm: Cleanup thunk_64.S x86, asm: Flip RESTORE_ARGS arguments logic x86, asm: Flip SAVE_ARGS arguments logic x86, asm: Thin down SAVE/RESTORE_* asm macros
2011-07-21x86, perf: Make copy_from_user_nmi() a library functionRobert Richter
copy_from_user_nmi() is used in oprofile and perf. Moving it to other library functions like copy_from_user(). As this is x86 code for 32 and 64 bits, create a new file usercopy.c for unified code. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21x86: Fix write lock scalability 64-bit issueJan Beulich
With the write lock path simply subtracting RW_LOCK_BIAS there is, on large systems, the theoretical possibility of overflowing the 32-bit value that was used so far (namely if 128 or more CPUs manage to do the subtraction, but don't get to do the inverse addition in the failure path quickly enough). A first measure is to modify RW_LOCK_BIAS itself - with the new value chosen, it is good for up to 2048 CPUs each allowed to nest over 2048 times on the read path without causing an issue. Quite possibly it would even be sufficient to adjust the bias a little further, assuming that allowing for significantly less nesting would suffice. However, as the original value chosen allowed for even more nesting levels, to support more than 2048 CPUs (possible currently only for 64-bit kernels) the lock itself gets widened to 64 bits. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21x86: Unify rwsem assembly implementationJan Beulich
Rather than having two functionally identical implementations for 32- and 64-bit configurations, use the previously extended assembly abstractions to fold the rwsem two implementations into a shared one. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21x86: Unify rwlock assembly implementationJan Beulich
Rather than having two functionally identical implementations for 32- and 64-bit configurations, extend the existing assembly abstractions enough to fold the two rwlock implementations into a shared one. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-13x86: Make alternative instruction pointers relativeAndy Lutomirski
This save a few bytes on x86-64 and means that future patches can apply alternatives to unrelocated code. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-03x86, asm: Cleanup thunk_64.SBorislav Petkov
Drop thunk_ra macro in favor of an additional argument to the thunk macro since their bodies are almost identical. Do a whitespace scrubbing and use CFI-aware macros for full annotation. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1306873314-32523-5-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-19Merge branches 'x86-apic-for-linus', 'x86-asm-for-linus' and ↵Linus Torvalds
'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Print verbose error interrupt reason on apic=debug * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Demacro CONFIG_PARAVIRT cpu accessors * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix mrst sparse complaints x86: Fix spelling error in the memcpy() source code comment x86, mpparse: Remove unnecessary variable
2011-05-18x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limitJiri Olsa
As reported in BZ #30352: https://bugzilla.kernel.org/show_bug.cgi?id=30352 there's a kernel bug related to reading the last allowed page on x86_64. The _copy_to_user() and _copy_from_user() functions use the following check for address limit: if (buf + size >= limit) fail(); while it should be more permissive: if (buf + size > limit) fail(); That's because the size represents the number of bytes being read/write from/to buf address AND including the buf address. So the copy function will actually never touch the limit address even if "buf + size == limit". Following program fails to use the last page as buffer due to the wrong limit check: #include <sys/mman.h> #include <sys/socket.h> #include <assert.h> #define PAGE_SIZE (4096) #define LAST_PAGE ((void*)(0x7fffffffe000)) int main() { int fds[2], err; void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); assert(ptr == LAST_PAGE); err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds); assert(err == 0); err = send(fds[0], ptr, PAGE_SIZE, 0); perror("send"); assert(err == PAGE_SIZE); err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL); perror("recv"); assert(err == PAGE_SIZE); return 0; } The other place checking the addr limit is the access_ok() function, which is working properly. There's just a misleading comment for the __range_not_ok() macro - which this patch fixes as well. The last page of the user-space address range is a guard page and Brian Gerst observed that the guard page itself due to an erratum on K8 cpus (#121 Sequential Execution Across Non-Canonical Boundary Causes Processor Hang). However, the test code is using the last valid page before the guard page. The bug is that the last byte before the guard page can't be read because of the off-by-one error. The guard page is left in place. This bug would normally not show up because the last page is part of the process stack and never accessed via syscalls. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Brian Gerst <brgerst@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-17x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSBFenghua Yu
Support memset() with enhanced rep stosb. On processors supporting enhanced REP MOVSB/STOSB, the alternative memset_c_e function using enhanced rep stosb overrides the fast string alternative memset_c and the original function. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1305671358-14478-10-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSBFenghua Yu
Support memmove() by enhanced rep movsb. On processors supporting enhanced REP MOVSB/STOSB, the alternative memmove() function using enhanced rep movsb overrides the original function. The patch doesn't change the backward memmove case to use enhanced rep movsb. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1305671358-14478-9-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17x86, mem: memcpy_64.S: Optimize memcpy by enhanced REP MOVSB/STOSBFenghua Yu
Support memcpy() with enhanced rep movsb. On processors supporting enhanced rep movsb, the alternative memcpy() function using enhanced rep movsb overrides the original function and the fast string function. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1305671358-14478-8-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSBFenghua Yu
Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative copy_user_enhanced_fast_string function using enhanced rep movsb overrides the original function and the fast string function. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17x86, mem: clear_page_64.S: Support clear_page() with enhanced REP MOVSB/STOSBFenghua Yu
Intel processors are adding enhancements to REP MOVSB/STOSB and the use of REP MOVSB/STOSB for optimal memcpy/memset or similar functions is recommended. Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB). Support clear_page() with rep stosb for processor supporting enhanced REP MOVSB /STOSB. On processors supporting enhanced REP MOVSB/STOSB, the alternative clear_page_c_e function using enhanced REP STOSB overrides the original function and the fast string function. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1305671358-14478-6-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-07Merge commit 'v2.6.39-rc6' into x86/cleanupsIngo Molnar
Merge reason: move to a (much) newer upstream base. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-01x86: Fix spelling error in the memcpy() source code commentBart Van Assche
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-27percpu: Omit segment prefix in the UP case for cmpxchg_doubleChristoph Lameter
Omit the segment prefix in the UP case. GS is not used then and we will generate segfaults if cmpxchg16b is used otherwise. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-18Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7
2011-03-18x86: Clean up csum-copy_64.S a bitIngo Molnar
The many stray whitespaces and other uncleanlinesses made this code almost unreadable to me - so fix those. No changes to the code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18x86: Fix common misspellingsLucas De Marchi
They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-16Merge branch 'for-2.6.39' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support percpu: Generic support for this_cpu_cmpxchg_double() alpha: use L1_CACHE_BYTES for cacheline size in the linker script percpu: align percpu readmostly subsection to cacheline Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the percpu alignment having changed ("x86: Reduce back the alignment of the per-CPU data section")
2011-03-15Merge branch 'x86-mem-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, mem: Convert memmove() to assembly file and fix return value bug
2011-03-02x86: Fix a bogus unwind annotation in lib/semaphore_32.SJan Beulich
'simple' would have required specifying current frame address and return address location manually, but that's obviously not the case (and not necessary) here. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D6D1082020000780003454C@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28x86: Remove unused bits from lib/thunk_*.SJan Beulich
Some of the items removed were apparently never used, others simply didn't get removed with their last user. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D6BD3A002000078000341F1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28x86: Use {push,pop}_cfi in more placesJan Beulich
Cleaning up and shortening code... Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4D6BD35002000078000341DA@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-28x86-64: Add CFI annotations to lib/rwsem_64.SJan Beulich
These weren't part of the initial commit of this code. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4D6BCDFF02000078000341B0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>