summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/entry_64.S
AgeCommit message (Collapse)Author
2011-01-13Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...
2011-01-12KVM: Handle async PF in a guest.Gleb Natapov
When async PF capability is detected hook up special page fault handler that will handle async page fault events and bypass other page faults to regular page fault handler. Also add async PF handling to nested SVM emulation. Async PF always generates exit to L1 where vcpu thread will be scheduled out until page is available. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-07x86: Save rbp in pt_regs on irq entryFrederic Weisbecker
From the x86_64 low level interrupt handlers, the frame pointer is saved right after the partial pt_regs frame. rbp is not supposed to be part of the irq partial saved registers, but it only requires to extend the pt_regs frame by 8 bytes to do so, plus a tiny stack offset fixup on irq exit. This changes a bit the semantics or get_irq_entry() that is supposed to provide only the value of caller saved registers and the cpu saved frame. However it's a win for unwinders that can walk through stack frames on top of get_irq_regs() snapshots. A noticeable impact is that it makes perf events cpu-clock and task-clock events based callchains working on x86_64. Let's then save rbp into the irq pt_regs. As a result with: perf record -e cpu-clock perf bench sched messaging perf report --stdio Before: 20.94% perf [kernel.kallsyms] [k] lock_acquire | --- lock_acquire | |--44.01%-- __write_nocancel | |--43.18%-- __read | |--6.08%-- fork | create_worker | |--0.88%-- _dl_fixup | |--0.65%-- do_lookup_x | |--0.53%-- __GI___libc_read --4.67%-- [...] After: 19.23% perf [kernel.kallsyms] [k] __lock_acquire | --- __lock_acquire | |--97.74%-- lock_acquire | | | |--21.82%-- _raw_spin_lock | | | | | |--37.26%-- unix_stream_recvmsg | | | sock_aio_read | | | do_sync_read | | | vfs_read | | | sys_read | | | system_call | | | __read | | | | | |--24.09%-- unix_stream_sendmsg | | | sock_aio_write | | | do_sync_write | | | vfs_write | | | sys_write | | | system_call | | | __write_nocancel v2: Fix cfi annotations. Reported-by: Soeren Sandmann Pedersen <sandmann@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Jan Beulich <JBeulich@novell.com>
2010-11-18x86/kprobes: Prevent kprobes to probe on save_args()Masami Hiramatsu
Prevent kprobes to probe on save_args() since this function will be called from breakpoint exception handler. That will cause infinit loop on breakpoint handling. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-23Merge branches 'softirq-for-linus', 'x86-debug-for-linus', ↵Linus Torvalds
'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirqs: Make wakeup_softirqd static * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Restore parentheses around one pushl_cfi argument x86, asm: Fix ancient-GAS workaround x86, asm: Fix CFI macro invocations to deal with shortcomings in gas * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: HPET force enable for CX700 / VIA Epia LT * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, setup: Use string copy operation to optimze copy in kernel compression * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() * 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.
2010-10-21Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove pr_<level> uses of KERN_<level> therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal' x86: Use {push,pop}{l,q}_cfi in more places i386: Add unwind directives to syscall ptregs stubs x86-64: Use symbolics instead of raw numbers in entry_64.S x86-64: Adjust frame type at paranoid_exit: x86-64: Fix unwind annotations in syscall stubs
2010-10-19x86, asm: Fix CFI macro invocations to deal with shortcomings in gasJan Beulich
gas prior to (perhaps) 2.16.90 has problems with passing non- parenthesized expressions containing spaces to macros. Spaces, however, get inserted by cpp between any macro expanding to a number and a subsequent + or -. For the +, current x86 gas then removes the space again (future gas may not do so), but for the - the space gets retained and is then considered a separator between macro arguments. Fix the respective definitions for both the - and + cases, so that they neither contain spaces nor make cpp insert any (the latter by adding seemingly redundant parentheses). Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-18irq_work: Add generic hardirq context callbacksPeter Zijlstra
Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86: Use {push,pop}{l,q}_cfi in more placesJan Beulich
... plus additionally introduce {push,pop}f{l,q}_cfi. All in the hope that the code becomes better readable this way (it gets quite a bit smaller in any case). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Use symbolics instead of raw numbers in entry_64.SJan Beulich
... making the code a little less fragile. Also use pushq_cfi instead of raw CFI annotations in two more places, and add two missing annotations after stack pointer adjustments which got modified here anyway. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Adjust frame type at paranoid_exit:Jan Beulich
As this isn't an exception or interrupt entry point, it doesn't have any of the hardware provide frame layouts active. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03x86-64: Fix unwind annotations in syscall stubsJan Beulich
With the return address removed from the stack, these should really refer to their caller's register state. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-13Mark arguments to certain syscalls as being constDavid Howells
Mark arguments to certain system calls as being const where they should be but aren't. The list includes: (*) The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. (*) The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um, x86: Cast to (u64 *) inside set_64bit() x86-32, asm: Directly access per-cpu GDT x86-64, asm: Directly access per-cpu IST x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu() x86, asm: Move cmpxchg emulation code to arch/x86/lib x86, asm: Clean up and simplify <asm/cmpxchg.h> x86, asm: Clean up and simplify set_64bit() x86: Add memory modify constraints to xchg() and cmpxchg() x86-64: Simplify loading initial_gs x86: Use symbolic MSR names x86: Remove redundant K6 MSRs
2010-08-04Merge branch 'upstream/pvhvm' into upstream/xenJeremy Fitzhardinge
* upstream/pvhvm: Introduce CONFIG_XEN_PVHVM compile option blkfront: do not create a PV cdrom device if xen_hvm_guest support multiple .discard.* sections to avoid section type conflicts xen/pvhvm: fix build problem when !CONFIG_XEN xenfs: enable for HVM domains too x86: Call HVMOP_pagetable_dying on exit_mmap. x86: Unplug emulated disks and nics. x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock. xen: Fix find_unbound_irq in presence of ioapic irqs. xen: Add suspend/resume support for PV on HVM guests. xen: Xen PCI platform device driver. x86/xen: event channels delivery on HVM. x86: early PV on HVM features initialization. xen: Add support for HVM hypercalls. Conflicts: arch/x86/xen/enlighten.c arch/x86/xen/time.c
2010-08-01x86-64, asm: Directly access per-cpu ISTBrian Gerst
Use a direct per-cpu reference for the IST instead of using a scratch register. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1280594903-6341-1-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-07-22x86/xen: event channels delivery on HVM.Sheng Yang
Set the callback to receive evtchns from Xen, using the callback vector delivery mechanism. The traditional way for receiving event channel notifications from Xen is via the interrupts from the platform PCI device. The callback vector is a newer alternative that allow us to receive notifications on any vcpu and doesn't need any PCI support: we allocate a vector exclusively to receive events, in the vector handler we don't need to interact with the vlapic, therefore we avoid a VMEXIT. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-21x86: auditsyscall: fix fastpath return value after rescheduleRoland McGrath
In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls, we can pass a bad return value and/or error indication for the system call to audit_syscall_exit(). This happens when TIF_NEED_RESCHED was set as the system call returned, so we went out to schedule() and came back to the exit-audit fast-path. The fix is to reload the user return value register from the pt_regs before using it for audit_syscall_exit(). Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit system call fast paths work slightly differently, so that they always leave the fast path entirely to reschedule and don't return there, so they don't have the analogous bugs. Reported-by: Alexander Viro <aviro@redhat.com> Signed-off-by: Roland McGrath <roland@redhat.com>
2009-12-16Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix kprobes build with non-gawk awk x86: Split swiotlb initialization into two stages x86: Regex support and known-movable symbols for relocs, fix _end x86, msr: Remove incorrect, duplicated code in the MSR driver x86: Merge kernel_thread() x86: Sync 32/64-bit kernel_thread x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread() x86, 64-bit: Move kernel_thread to C x86-64, paravirt: Call set_iopl_mask() on 64 bits x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2 x86: Merge sys_clone x86, 32-bit: Convert sys_vm86 & sys_vm86old x86: Merge sys_sigaltstack x86: Merge sys_execve x86: Merge sys_iopl x86-32: Add new pt_regs stubs cpumask: Use modern cpumask style in arch/x86/kernel/cpu/mcheck/mce-inject.c
2009-12-11Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits) x86, perf events: Check if we have APIC enabled perf_event: Fix variable initialization in other codepaths perf kmem: Fix unused argument build warning perf symbols: perf_header__read_build_ids() offset'n'size should be u64 perf symbols: dsos__read_build_ids() should read both user and kernel buildids perf tools: Align long options which have no short forms perf kmem: Show usage if no option is specified sched: Mark sched_clock() as notrace perf sched: Add max delay time snapshot perf tools: Correct size given to memset perf_event: Fix perf_swevent_hrtimer() variable initialization perf sched: Fix for getting task's execution time tracing/kprobes: Fix field creation's bad error handling perf_event: Cleanup for cpu_clock_perf_event_update() perf_event: Allocate children's perf_event_ctxp at the right time perf_event: Clean up __perf_event_init_context() hw-breakpoints: Modify breakpoints without unregistering them perf probe: Update perf-probe document perf probe: Support --del option trace-kprobe: Support delete probe syntax ...
2009-12-10x86, 64-bit: Move kernel_thread to CBrian Gerst
Prepare for merging with 32-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-08Merge branch 'x86-uv-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: UV RTC: Always enable RTC clocksource x86: UV RTC: Rename generic_interrupt to x86_platform_ipi x86: UV RTC: Clean up error handling x86: UV RTC: Add clocksource only boot option x86: UV RTC: Fix early expiry handling
2009-12-06x86: Fixup wrong debug exception frame link in stacktracesFrederic Weisbecker
While dumping a stacktrace, the end of the exception stack won't link the frame pointer to the previous stack. The interrupted stack will then be considered as unreliable and ignored by perf, as the frame pointer is unreliable itself. This happens because we overwrite the frame pointer that links to the interrupted frame with the address of the exception stack. This is done in order to reserve space inside. But rbp has been chosen here only because it is not a scratch register, so that the address of the exception stack remains in rbp after calling do_debug(), we can then release the exception stack space without the need to retrieve its address again. But we can pick another non-scratch register to do that, so that we preserve the link to the interrupted stack frame in the stacktraces. Just randomly choose r12. Every registers are saved just before and restored just after calling do_debug(). And r12 is not used in the middle, which makes it a perfect candidate. Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw Before: 44.18% [k] _raw_read_lock | | --- |--6.31%-- waitid | |--4.26%-- writev | |--3.63%-- __select | |--3.15%-- __waitpid | | | |--28.57%-- 0x8b52e00000139f | | | |--28.57%-- 0x8b52e0000013c6 | | | |--14.29%-- 0x7fde786dc000 | | | |--14.29%-- 0x62696c2f7273752f | | | --14.29%-- 0x1ea9df800000000 | |--3.00%-- __poll After: 43.94% [k] _raw_read_lock | --- _read_lock | |--60.53%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | synaptics_process_byte | psmouse_handle_byte | psmouse_interrupt | serio_interrupt | i8042_interrupt | handle_IRQ_event | handle_edge_irq | handle_irq | __irqentry_text_start | ret_from_intr | | | |--30.43%-- __select | | | |--17.39%-- 0x454f15 | | | |--13.04%-- __read | | | |--13.04%-- vread_hpet | | | |--13.04%-- _xcb_lock_io | | | --13.04%-- 0x7f630878ce87 Note: it does not only affect perf events but also other stacktraces in x86-64. They were considered as unreliable once we quit the debug stack frame. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
2009-12-05Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h x86/alternatives: Check replacementlen <= instrlen at build time x86, 64-bit: Set data segments to null after switching to 64-bit mode x86: Clean up the loadsegment() macro x86: Optimize loadsegment() x86: Add missing might_fault() checks to copy_{to,from}_user() x86-64: __copy_from_user_inatomic() adjustments x86: Remove unused thread_return label from switch_to() x86, 64-bit: Fix bstep_iret jump x86: Don't use the strict copy checks when branch profiling is in use x86, 64-bit: Move K8 B step iret fixup to fault entry asm x86: Generate cmpxchg build failures x86: Add a Kconfig option to turn the copy_from_user warnings into errors x86: Turn the copy_from_user check into an (optional) compile time warning x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
2009-11-03x86, 64-bit: Fix bstep_iret jumpBrian Gerst
This jump should be unconditional. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1257274925-15713-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23Merge branch 'perf/core' into perf/probesIngo Molnar
Conflicts: tools/perf/Makefile Merge reason: - fix the conflict - pick up the pr_*() infrastructure to queue up dependent patch Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14x86: UV RTC: Rename generic_interrupt to x86_platform_ipiDimitri Sivanich
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20091014142257.GE11048@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14function-graph/x86: Replace unbalanced ret with jmpSteven Rostedt
The function graph tracer replaces the return address with a hook to trace the exit of the function call. This hook will finish by returning to the real location the function should return to. But the current implementation uses a ret to jump to the real return location. This causes a imbalance between calls and ret. That is the original function does a call, the ret goes to the handler and then the handler does a ret without a matching call. Although the function graph tracer itself still breaks the branch predictor by replacing the original ret, by using a second ret and causing an imbalance, it breaks the predictor even more. This patch replaces the ret with a jmp to keep the calls and ret balanced. I tested this on one box and it showed a 1.7% increase in performance. Another box only showed a small 0.3% increase. But no box that I tested this on showed a decrease in performance by making this change. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091013203425.042034383@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12x86, 64-bit: Move K8 B step iret fixup to fault entry asmBrian Gerst
Move the handling of truncated %rip from an iret fault to the fault entry path. This allows x86-64 to use the standard search_extable() function. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <jbeulich@novell.com> LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23Merge commit 'linus/master' into tracing/kprobesFrederic Weisbecker
Conflicts: kernel/trace/Makefile kernel/trace/trace.h kernel/trace/trace_event_types.h kernel/trace/trace_export.c Merge reason: Sync with latest significant tracing core changes.
2009-09-22x86: ptrace: sysret path should reach syscall_trace_leaveRoland McGrath
If TIF_SYSCALL_TRACE or TIF_SINGLESTEP is set while inside a syscall, the path back to user mode should get to syscall_trace_leave. This does happen in most circumstances. The exception to this is on the 64-bit syscall fastpath, when no such flag was set on syscall entry and nothing else has punted it off the fastpath for exit. That one exit fastpath fails to check for _TIF_WORK_SYSCALL_EXIT flags. This makes the behavior inconsistent with what 32-bit tasks see and what the native 32-bit kernel always does, and what 64-bit tasks see in all cases where the iret path is taken anyhow. Perhaps the only example that is affected is a ptrace stop inside do_fork (for PTRACE_O_TRACE{CLONE,FORK,VFORK,VFORKDONE}). Other syscalls with internal ptrace stop points (execve) already take the iret exit path for unrelated reasons. Test cases for both PTRACE_SYSCALL and PTRACE_SINGLESTEP variants are at: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/syscall-from-clone.c?cvsroot=systemtap http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/step-from-clone.c?cvsroot=systemtap There was no special benefit to the sysret path's special path to call do_notify_resume, because it always takes the iret exit path at the end. So this change just makes the sysret exit path join the iret exit path for all the signals and ptrace cases. The fastpath still applies to the plain syscall-audit and resched cases. Signed-off-by: Roland McGrath <roland@redhat.com> CC: Oleg Nesterov <oleg@redhat.com>
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-12tracing/function-graph: x86_64 stack allocation cleanupJiri Olsa
Only 24 bytes needs to be reserved on the stack for the function graph tracer on x86_64. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-08-30kprobes/x86-64: Fix to move common_interrupt to .kprobes.textMasami Hiramatsu
Since nmi, debug and int3 returns to irq_return inside common_interrupt, probing this function will cause int3-loop, so it should be marked as __kprobes. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20090827172325.8246.40000.stgit@localhost.localdomain> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-18function-graph: add stack frame testSteven Rostedt
In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-11Merge branch 'linus' into x86/mce3Ingo Molnar
Conflicts: arch/x86/kernel/cpu/mcheck/mce_64.c arch/x86/kernel/irq.c Merge reason: Resolve the conflicts above. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11Merge branch 'linus' into perfcounters/coreIngo Molnar
Conflicts: arch/x86/kernel/irqinit.c arch/x86/kernel/irqinit_64.c arch/x86/kernel/traps.c arch/x86/mm/fault.c include/linux/sched.h kernel/exit.c
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-03x86: fix panic with interrupts off (needed for MCE)Andi Kleen
For some time each panic() called with interrupts disabled triggered the !irqs_disabled() WARN_ON in smp_call_function(), producing ugly backtraces and confusing users. This is a common situation with machine checks for example which tend to call panic with interrupts disabled, but will also hit in other situations e.g. panic during early boot. In fact it means that panic cannot be called in many circumstances, which would be bad. This all started with the new fancy queued smp_call_function, which is then used by the shutdown path to shut down the other CPUs. On closer examination it turned out that the fancy RCU smp_call_function() does lots of things not suitable in a panic situation anyways, like allocating memory and relying on complex system state. I originally tried to patch this over by checking for panic there, but it was quite complicated and the original patch was also not very popular. This also didn't fix some of the underlying complexity problems. The new code in post 2.6.29 tries to patch around this by checking for oops_in_progress, but that is not enough to make this fully safe and I don't think that's a real solution because panic has to be reliable. So instead use an own vector to reboot. This makes the reboot code extremly straight forward, which is definitely a big plus in a panic situation where it is important to avoid relying on too much kernel state. The new simple code is also safe to be called from interupts off region because it is very very simple. There can be situations where it is important that panic is reliable. For example on a fatal machine check the panic is needed to get the system up again and running as quickly as possible. So it's important that panic is reliable and all function it calls simple. This is why I came up with this simple vector scheme. It's very hard to beat in simplicity. Vectors are not particularly precious anymore since all big systems are using per CPU vectors. Another possibility would have been to use an NMI similar to kdump, but there is still the problem that NMIs don't work reliably on some systems due to BIOS issues. NMIs would have been able to stop CPUs running with interrupts off too. In the sake of universal reliability I opted for using a non NMI vector for now. I put the reboot vector into the highest priority bucket of the APIC vectors and moved the 64bit UV_BAU message down instead into the next lower priority. [ Impact: bug fix, fixes an old regression ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: implement bootstrapping for machine check wakeupsAndi Kleen
Machine checks support waking up the mcelog daemon quickly. The original wake up code for this was pretty ugly, relying on a idle notifier and a special process flag. The reason it did it this way is that the machine check handler is not subject to normal interrupt locking rules so it's not safe to call wake_up(). Instead it set a process flag and then either did the wakeup in the syscall return or in the idle notifier. This patch adds a new "bootstraping" method as replacement. The idea is that the handler checks if it's in a state where it is unsafe to call wake_up(). If it's safe it calls it directly. When it's not safe -- that is it interrupted in a critical section with interrupts disables -- it uses a new "self IPI" to trigger an IPI to its own CPU. This can be done safely because IPI triggers are atomic with some care. The IPI is raised once the interrupts are reenabled and can then safely call wake_up(). When APICs are disabled the event is just queued and will be picked up eventually by the next polling timer. I think that's a reasonable compromise, since it should only happen quite rarely. Contains fixes from Ying Huang. [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03perf_counter/x86: Remove the IRQ (non-NMI) handling bitsYong Wang
Remove the IRQ (non-NMI) handling bits as NMI will be used always. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-28x86, mce: enable MCE_INTEL for 32bit new MCEAndi Kleen
Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: use a call vector to call the 64bit mce handlerAndi Kleen
Allows to call different machine check handlers from the low level machine check entry vector. This is needed for later when it will be used for 32bit too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-08xen/x86-64: fix breakpoints and hardware watchpointsJeremy Fitzhardinge
Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-05-07Merge branch 'linus' into tracing/coreIngo Molnar
Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29Merge branch 'linus' into perfcounters/coreIngo Molnar
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-18lockdep, x86: account for irqs enabled in paranoid_exitSteven Rostedt
I hit the check_flags error of lockdep: WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0() [...] hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40 softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0 The check_flags warning of lockdep tells me that lockdep thought interrupts were disabled, but they were really enabled. The numbers in the above parenthesis show the order of events: 12566: softirqs last enabled: lock_sock_nested 12567: hardirqs last enabled: local_bh_enable 12568: softirqs last disabled: tcp_prequeue_process 12566: hardirqs last disabled: int3 int3 is a breakpoint! Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds break points into the kernel. The paranoid_exit of the return of int3 does not account for enabling interrupts on return to kernel. This code is a bit tricky since it is also used by the nmi handler (when lockdep is off), and we must be careful about the swapgs. We can not call kernel code after the swapgs has been performed. [ Impact: fix lockdep check_flags warning + self-turn-off ] Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86, function-graph: only save return values on x86_64Steven Rostedt
Impact: speed up The return to handler portion of the function graph tracer should only need to save the return values. The caller already saved off the registers that the callee can modify. The returning function already saved the registers it modified. When we call our own trace function it too will save the registers that the callee must restore. There's no reason to save off anything more that the registers used to return the values. Note, I did a complete kernel build with this modification and the function graph tracer running on x86_64. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07perf_counter: x86: self-IPI for pending workPeter Zijlstra
Implement set_perf_counter_pending() with a self-IPI so that it will run ASAP in a usable context. For now use a second IRQ vector, because the primary vector pokes the apic in funny ways that seem to confuse things. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094517.724626696@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06Merge branch 'linus' into perfcounters/core-v2Ingo Molnar
Merge reason: we have gathered quite a few conflicts, need to merge upstream Conflicts: arch/powerpc/kernel/Makefile arch/x86/ia32/ia32entry.S arch/x86/include/asm/hardirq.h arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/cpu/common.c arch/x86/kernel/irq.c arch/x86/kernel/syscall_table_32.S arch/x86/mm/iomap_32.c include/linux/sched.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>