Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf changes from Ingo Molnar: "There are lots of improvements, the biggest changes are: Main kernel side changes: - Improve uprobes performance by adding 'pre-filtering' support, by Oleg Nesterov. - Make some POWER7 events available in sysfs, equivalent to what was done on x86, from Sukadev Bhattiprolu. - tracing updates by Steve Rostedt - mostly misc fixes and smaller improvements. - Use perf/event tracing to report PCI Express advanced errors, by Tony Luck. - Enable northbridge performance counters on AMD family 15h, by Jacob Shin. - This tracing commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties (PowerTop in particular) seem to agree that it's safe to do now with the introduction of libtraceevent, but the devil is in the details ... Main tooling side changes: - Add 'event group view', from Namyung Kim: To use it, 'perf record' should group events when recording. And then perf report parses the saved group relation from file header and prints them together if --group option is provided. You can use the 'perf evlist' command to see event group information: $ perf record -e '{ref-cycles,cycles}' noploop 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ] $ perf evlist --group {ref-cycles,cycles} With this example, default perf report will show you each event separately. You can use --group option to enable event group view: $ perf report --group ... # group: {ref-cycles,cycles} # ======== # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time As you can see the Overhead column now contains both of ref-cycles and cycles and header line shows group information also - 'anon group { ref-cycles, cycles }'. The output is sorted by period of group leader first. - Initial GTK+ annotate browser, from Namhyung Kim. - Add option for runtime switching perf data file in perf report, just press 's' and a menu with the valid files found in the current directory will be presented, from Feng Tang. - Add support to display whole group data for raw columns, from Jiri Olsa. - Add per processor socket count aggregation in perf stat, from Stephane Eranian. - Add interval printing in 'perf stat', from Stephane Eranian. - 'perf test' improvements - Add support for wildcards in tracepoint system name, from Jiri Olsa. - Add anonymous huge page recognition, from Joshua Zhu. - perf build-id cache now can show DSOs present in a perf.data file that are not in the cache, to integrate with build-id servers being put in place by organizations such as Fedora. - perf top now shares more of the evsel config/creation routines with 'record', paving the way for further integration like 'top' snapshots, etc. - perf top now supports DWARF callchains. - Fix mmap limitations on 32-bit, fix from David Miller. - 'perf bench numa mem' NUMA performance measurement suite - ... and lots of fixes, performance improvements, cleanups and other improvements I failed to list - see the shortlog and git log for details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits) perf/x86/amd: Enable northbridge performance counters on AMD family 15h perf/hwbp: Fix cleanup in case of kzalloc failure perf tools: Fix build with bison 2.3 and older. perf tools: Limit unwind support to x86 archs perf annotate: Make it to be able to skip unannotatable symbols perf gtk/annotate: Fail early if it can't annotate perf gtk/annotate: Show source lines with gray color perf gtk/annotate: Support multiple event annotation perf ui/gtk: Implement basic GTK2 annotation browser perf annotate: Fix warning message on a missing vmlinux perf buildid-cache: Add --update option uprobes/perf: Avoid uprobe_apply() whenever possible uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE uprobes/perf: Teach trace_uprobe/perf code to pre-filter uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's uprobes: Introduce uprobe_apply() perf: Introduce hw_perf_event->tp_target and ->tp_list uprobes/perf: Always increment trace_uprobe->nhit uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe uprobes/tracing: Introduce is_trace_uprobe_enabled() ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2013-02-19 17:49:41 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2013-02-19 17:49:41 -0800
commit: 8f55cea410dbc56114bb71a3742032070c8108d0 (patch)
tree: 59605f0ee961274b22f91add33f5c32459471a83 /kernel/trace/ring_buffer.c
parent: b7133a9a103655cda254987a3c0975fd9d8c443f (diff)
parent: e259514eef764a5286873618e34c560ecb6cff13 (diff)
1 files changed, 81 insertions, 27 deletions
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index ce8514feedc..7244acde77b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3,8 +3,10 @@
  *
  * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
  */
+#include <linux/ftrace_event.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
+#include <linux/trace_seq.h>
 #include <linux/spinlock.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
@@ -21,7 +23,6 @@
 #include <linux/fs.h>
 
 #include <asm/local.h>
-#include "trace.h"
 
 static void update_pages_handler(struct work_struct *work);
 
@@ -2432,41 +2433,76 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 #ifdef CONFIG_TRACING
 
-#define TRACE_RECURSIVE_DEPTH 16
+/*
+ * The lock and unlock are done within a preempt disable section.
+ * The current_context per_cpu variable can only be modified
+ * by the current task between lock and unlock. But it can
+ * be modified more than once via an interrupt. To pass this
+ * information from the lock to the unlock without having to
+ * access the 'in_interrupt()' functions again (which do show
+ * a bit of overhead in something as critical as function tracing,
+ * we use a bitmask trick.
+ *
+ *  bit 0 =  NMI context
+ *  bit 1 =  IRQ context
+ *  bit 2 =  SoftIRQ context
+ *  bit 3 =  normal context.
+ *
+ * This works because this is the order of contexts that can
+ * preempt other contexts. A SoftIRQ never preempts an IRQ
+ * context.
+ *
+ * When the context is determined, the corresponding bit is
+ * checked and set (if it was set, then a recursion of that context
+ * happened).
+ *
+ * On unlock, we need to clear this bit. To do so, just subtract
+ * 1 from the current_context and AND it to itself.
+ *
+ * (binary)
+ *  101 - 1 = 100
+ *  101 & 100 = 100 (clearing bit zero)
+ *
+ *  1010 - 1 = 1001
+ *  1010 & 1001 = 1000 (clearing bit 1)
+ *
+ * The least significant bit can be cleared this way, and it
+ * just so happens that it is the same bit corresponding to
+ * the current context.
+ */
+static DEFINE_PER_CPU(unsigned int, current_context);
 
-/* Keep this code out of the fast path cache */
-static noinline void trace_recursive_fail(void)
+static __always_inline int trace_recursive_lock(void)
 {
-	/* Disable all tracing before we do anything else */
-	tracing_off_permanent();
-
-	printk_once(KERN_WARNING "Tracing recursion: depth[%ld]:"
-		    "HC[%lu]:SC[%lu]:NMI[%lu]\n",
-		    trace_recursion_buffer(),
-		    hardirq_count() >> HARDIRQ_SHIFT,
-		    softirq_count() >> SOFTIRQ_SHIFT,
-		    in_nmi());
-
-	WARN_ON_ONCE(1);
-}
+	unsigned int val = this_cpu_read(current_context);
+	int bit;
 
-static inline int trace_recursive_lock(void)
-{
-	trace_recursion_inc();
+	if (in_interrupt()) {
+		if (in_nmi())
+			bit = 0;
+		else if (in_irq())
+			bit = 1;
+		else
+			bit = 2;
+	} else
+		bit = 3;
 
-	if (likely(trace_recursion_buffer() < TRACE_RECURSIVE_DEPTH))
-		return 0;
+	if (unlikely(val & (1 << bit)))
+		return 1;
 
-	trace_recursive_fail();
+	val |= (1 << bit);
+	this_cpu_write(current_context, val);
 
-	return -1;
+	return 0;
 }
 
-static inline void trace_recursive_unlock(void)
+static __always_inline void trace_recursive_unlock(void)
 {
-	WARN_ON_ONCE(!trace_recursion_buffer());
+	unsigned int val = this_cpu_read(current_context);
 
-	trace_recursion_dec();
+	val--;
+	val &= this_cpu_read(current_context);
+	this_cpu_write(current_context, val);
 }
 
 #else
@@ -3067,6 +3103,24 @@ ring_buffer_dropped_events_cpu(struct ring_buffer *buffer, int cpu)
 EXPORT_SYMBOL_GPL(ring_buffer_dropped_events_cpu);
 
 /**
+ * ring_buffer_read_events_cpu - get the number of events successfully read
+ * @buffer: The ring buffer
+ * @cpu: The per CPU buffer to get the number of events read
+ */
+unsigned long
+ring_buffer_read_events_cpu(struct ring_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+
+	if (!cpumask_test_cpu(cpu, buffer->cpumask))
+		return 0;
+
+	cpu_buffer = buffer->buffers[cpu];
+	return cpu_buffer->read;
+}
+EXPORT_SYMBOL_GPL(ring_buffer_read_events_cpu);
+
+/**
  * ring_buffer_entries - get the number of entries in a buffer
  * @buffer: The ring buffer
  *
@@ -3425,7 +3479,7 @@ static void rb_advance_iter(struct ring_buffer_iter *iter)
 	/* check for end of page padding */
 	if ((iter->head >= rb_page_size(iter->head_page)) &&
 	    (iter->head_page != cpu_buffer->commit_page))
-		rb_advance_iter(iter);
+		rb_inc_iter(iter);
 }
 
 static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
author	Linus Torvalds <torvalds@linux-foundation.org>	2013-02-19 17:49:41 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2013-02-19 17:49:41 -0800
commit	8f55cea410dbc56114bb71a3742032070c8108d0 (patch)
tree	59605f0ee961274b22f91add33f5c32459471a83 /kernel/trace/ring_buffer.c
parent	b7133a9a103655cda254987a3c0975fd9d8c443f (diff)
parent	e259514eef764a5286873618e34c560ecb6cff13 (diff)