Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf update from Ingo Molnar: "Lots of changes in this cycle as well, with hundreds of commits from over 30 contributors. Most of the activity was on the tooling side. Higher level changes: - New 'perf kvm' analysis tool, from Xiao Guangrong. - New 'perf trace' system-wide tracing tool - uprobes fixes + cleanups from Oleg Nesterov. - Lots of patches to make perf build on Android out of box, from Irina Tirdea - Extend ftrace function tracing utility to be more dynamic for its users. It allows for data passing to the callback functions, as well as reading regs as if a breakpoint were to trigger at function entry. The main goal of this patch series was to allow kprobes to use ftrace as an optimized probe point when a probe is placed on an ftrace nop. With lots of help from Masami Hiramatsu, and going through lots of iterations, we finally came up with a good solution. - Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng. - Various tracing updates from Steve Rostedt - Clean up and improve 'perf sched' performance by elliminating lots of needless calls to libtraceevent. - Event group parsing support, from Jiri Olsa - UI/gtk refactorings and improvements from Namhyung Kim - Add support for non-tracepoint events in perf script python, from Feng Tang - Add --symbols to 'script', similar to the one in 'report', from Feng Tang. Infrastructure enhancements and fixes: - Convert the trace builtins to use the growing evsel/evlist tracepoint infrastructure, removing several open coded constructs like switch like series of strcmp to dispatch events, etc. Basically what had already been showcased in 'perf sched'. - Add evsel constructor for tracepoints, that uses libtraceevent just to parse the /format events file, use it in a new 'perf test' to make sure the libtraceevent format parsing regressions can be more readily caught. - Some strange errors were happening in some builds, but not on the next, reported by several people, problem was some parser related files, generated during the build, didn't had proper make deps, fix from Eric Sandeen. - Introduce struct and cache information about the environment where a perf.data file was captured, from Namhyung Kim. - Fix handling of unresolved samples when --symbols is used in 'report', from Feng Tang. - Add union member access support to 'probe', from Hyeoncheol Lee. - Fixups to die() removal, from Namhyung Kim. - Render fixes for the TUI, from Namhyung Kim. - Don't enable annotation in non symbolic view, from Namhyung Kim. - Fix pipe mode in 'report', from Namhyung Kim. - Move related stats code from stat to util/, will be used by the 'stat' kvm tool, from Xiao Guangrong. - Remove die()/exit() calls from several tools. - Resolve vdso callchains, from Jiri Olsa - Don't pass const char pointers to basename, so that we can unconditionally use libgen.h and thus avoid ifdef BIONIC lines, from David Ahern - Refactor hist formatting so that it can be reused with the GTK browser, From Namhyung Kim - Fix build for another rbtree.c change, from Adrian Hunter. - Make 'perf diff' command work with evsel hists, from Jiri Olsa. - Use the only field_sep var that is set up: symbol_conf.field_sep, fix from Jiri Olsa. - .gitignore compiled python binaries, from Namhyung Kim. - Get rid of die() in more libtraceevent places, from Namhyung Kim. - Rename libtraceevent 'private' struct member to 'priv' so that it works in C++, from Steven Rostedt - Remove lots of exit()/die() calls from tools so that the main perf exit routine can take place, from David Ahern - Fix x86 build on x86-64, from David Ahern. - {int,str,rb}list fixes from Suzuki K Poulose - perf.data header fixes from Namhyung Kim - Allow user to indicate objdump path, needed in cross environments, from Maciek Borzecki - Fix hardware cache event name generation, fix from Jiri Olsa - Add round trip test for sw, hw and cache event names, catching the problem Jiri fixed, after Jiri's patch, the test passes successfully. - Clean target should do clean for lib/traceevent too, fix from David Ahern - Check the right variable for allocation failure, fix from Namhyung Kim - Set up evsel->tp_format regardless of evsel->name being set already, fix from Namhyung Kim - Oprofile fixes from Robert Richter. - Remove perf_event_attr needless version inflation, from Jiri Olsa - Introduce libtraceevent strerror like error reporting facility, from Namhyung Kim - Add pmu mappings to perf.data header and use event names from cmd line, from Robert Richter - Fix include order for bison/flex-generated C files, from Ben Hutchings - Build fixes and documentation corrections from David Ahern - Assorted cleanups from Robert Richter - Let O= makes handle relative paths, from Steven Rostedt - perf script python fixes, from Feng Tang. - Initial bash completion support, from Frederic Weisbecker - Allow building without libelf, from Namhyung Kim. - Support DWARF CFI based unwind to have callchains when %bp based unwinding is not possible, from Jiri Olsa. - Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF section was the end goal, several fixes for code that handles all architectures and cleanups are included, from Cody Schafer. - Assorted fixes for Documentation and build in 32 bit, from Robert Richter - Cache the libtraceevent event_format associated to each evsel early, so that we avoid relookups, i.e. calling pevent_find_event repeatedly when processing tracepoint events. [ This is to reduce the surface contact with libtraceevents and make clear what is that the perf tools needs from that lib: so far parsing the common and per event fields. ] - Don't stop the build if the audit libraries are not installed, fix from Namhyung Kim. - Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf. - Improve warning message when libunwind devel packages not present, from Jiri Olsa" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf trace: Add aliases for some syscalls perf probe: Print an enum type variable in "enum variable-name" format when showing accessible variables perf tools: Check libaudit availability for perf-trace builtin perf hists: Add missing period_* fields when collapsing a hist entry perf trace: New tool perf evsel: Export the event_format constructor perf evsel: Introduce rawptr() method perf tools: Use perf_evsel__newtp in the event parser perf evsel: The tracepoint constructor should store sys:name perf evlist: Introduce set_filter() method perf evlist: Renane set_filters method to apply_filters perf test: Add test to check we correctly parse and match syscall open parms perf evsel: Handle endianity in intval method perf evsel: Know if byte swap is needed perf tools: Allow handling a NULL cpu_map as meaning "all cpus" perf evsel: Improve tracepoint constructor setup tools lib traceevent: Fix error path on pevent_parse_event perf test: Fix build failure trace: Move trace event enable from fs_initcall to core_initcall tracing: Add an option for disabling markers ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2012-10-01 10:28:49 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2012-10-01 10:28:49 -0700
commit: 7e92daaefa68e5ef1e1732e45231e73adbb724e7 (patch)
tree: 8e7f8ac9d82654df4c65939c6682f95510e22977 /tools/perf/builtin-sched.c
parent: 7a68294278ae714ce2632a54f0f46916dca64f56 (diff)
parent: 1d787d37c8ff6612b8151c6dff15bfa7347bcbdf (diff)
1 files changed, 683 insertions, 839 deletions
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 7a9ad2b1ee7..9b9e32eaa80 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -23,31 +23,12 @@
 #include <pthread.h>
 #include <math.h>
 
-static const char		*input_name;
-
-static char			default_sort_order[] = "avg, max, switch, runtime";
-static const char		*sort_order = default_sort_order;
-
-static int			profile_cpu = -1;
-
 #define PR_SET_NAME		15               /* Set process name */
 #define MAX_CPUS		4096
-
-static u64			run_measurement_overhead;
-static u64			sleep_measurement_overhead;
-
 #define COMM_LEN		20
 #define SYM_LEN			129
-
 #define MAX_PID			65536
 
-static unsigned long		nr_tasks;
-
-struct perf_sched {
-	struct perf_tool    tool;
-	struct perf_session *session;
-};
-
 struct sched_atom;
 
 struct task_desc {
@@ -85,44 +66,6 @@ struct sched_atom {
 	struct task_desc	*wakee;
 };
 
-static struct task_desc		*pid_to_task[MAX_PID];
-
-static struct task_desc		**tasks;
-
-static pthread_mutex_t		start_work_mutex = PTHREAD_MUTEX_INITIALIZER;
-static u64			start_time;
-
-static pthread_mutex_t		work_done_wait_mutex = PTHREAD_MUTEX_INITIALIZER;
-
-static unsigned long		nr_run_events;
-static unsigned long		nr_sleep_events;
-static unsigned long		nr_wakeup_events;
-
-static unsigned long		nr_sleep_corrections;
-static unsigned long		nr_run_events_optimized;
-
-static unsigned long		targetless_wakeups;
-static unsigned long		multitarget_wakeups;
-
-static u64			cpu_usage;
-static u64			runavg_cpu_usage;
-static u64			parent_cpu_usage;
-static u64			runavg_parent_cpu_usage;
-
-static unsigned long		nr_runs;
-static u64			sum_runtime;
-static u64			sum_fluct;
-static u64			run_avg;
-
-static unsigned int		replay_repeat = 10;
-static unsigned long		nr_timestamps;
-static unsigned long		nr_unordered_timestamps;
-static unsigned long		nr_state_machine_bugs;
-static unsigned long		nr_context_switch_bugs;
-static unsigned long		nr_events;
-static unsigned long		nr_lost_chunks;
-static unsigned long		nr_lost_events;
-
 #define TASK_STATE_TO_CHAR_STR "RSDTtZX"
 
 enum thread_state {
@@ -154,11 +97,79 @@ struct work_atoms {
 
 typedef int (*sort_fn_t)(struct work_atoms *, struct work_atoms *);
 
-static struct rb_root		atom_root, sorted_atom_root;
+struct perf_sched;
+
+struct trace_sched_handler {
+	int (*switch_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+			    struct perf_sample *sample, struct machine *machine);
 
-static u64			all_runtime;
-static u64			all_count;
+	int (*runtime_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+			     struct perf_sample *sample, struct machine *machine);
 
+	int (*wakeup_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+			    struct perf_sample *sample, struct machine *machine);
+
+	int (*fork_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+			  struct perf_sample *sample);
+
+	int (*migrate_task_event)(struct perf_sched *sched,
+				  struct perf_evsel *evsel,
+				  struct perf_sample *sample,
+				  struct machine *machine);
+};
+
+struct perf_sched {
+	struct perf_tool tool;
+	const char	 *input_name;
+	const char	 *sort_order;
+	unsigned long	 nr_tasks;
+	struct task_desc *pid_to_task[MAX_PID];
+	struct task_desc **tasks;
+	const struct trace_sched_handler *tp_handler;
+	pthread_mutex_t	 start_work_mutex;
+	pthread_mutex_t	 work_done_wait_mutex;
+	int		 profile_cpu;
+/*
+ * Track the current task - that way we can know whether there's any
+ * weird events, such as a task being switched away that is not current.
+ */
+	int		 max_cpu;
+	u32		 curr_pid[MAX_CPUS];
+	struct thread	 *curr_thread[MAX_CPUS];
+	char		 next_shortname1;
+	char		 next_shortname2;
+	unsigned int	 replay_repeat;
+	unsigned long	 nr_run_events;
+	unsigned long	 nr_sleep_events;
+	unsigned long	 nr_wakeup_events;
+	unsigned long	 nr_sleep_corrections;
+	unsigned long	 nr_run_events_optimized;
+	unsigned long	 targetless_wakeups;
+	unsigned long	 multitarget_wakeups;
+	unsigned long	 nr_runs;
+	unsigned long	 nr_timestamps;
+	unsigned long	 nr_unordered_timestamps;
+	unsigned long	 nr_state_machine_bugs;
+	unsigned long	 nr_context_switch_bugs;
+	unsigned long	 nr_events;
+	unsigned long	 nr_lost_chunks;
+	unsigned long	 nr_lost_events;
+	u64		 run_measurement_overhead;
+	u64		 sleep_measurement_overhead;
+	u64		 start_time;
+	u64		 cpu_usage;
+	u64		 runavg_cpu_usage;
+	u64		 parent_cpu_usage;
+	u64		 runavg_parent_cpu_usage;
+	u64		 sum_runtime;
+	u64		 sum_fluct;
+	u64		 run_avg;
+	u64		 all_runtime;
+	u64		 all_count;
+	u64		 cpu_last_switched[MAX_CPUS];
+	struct rb_root	 atom_root, sorted_atom_root;
+	struct list_head sort_list, cmp_pid;
+};
 
 static u64 get_nsecs(void)
 {
@@ -169,13 +180,13 @@ static u64 get_nsecs(void)
 	return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
 }
 
-static void burn_nsecs(u64 nsecs)
+static void burn_nsecs(struct perf_sched *sched, u64 nsecs)
 {
 	u64 T0 = get_nsecs(), T1;
 
 	do {
 		T1 = get_nsecs();
-	} while (T1 + run_measurement_overhead < T0 + nsecs);
+	} while (T1 + sched->run_measurement_overhead < T0 + nsecs);
 }
 
 static void sleep_nsecs(u64 nsecs)
@@ -188,24 +199,24 @@ static void sleep_nsecs(u64 nsecs)
 	nanosleep(&ts, NULL);
 }
 
-static void calibrate_run_measurement_overhead(void)
+static void calibrate_run_measurement_overhead(struct perf_sched *sched)
 {
 	u64 T0, T1, delta, min_delta = 1000000000ULL;
 	int i;
 
 	for (i = 0; i < 10; i++) {
 		T0 = get_nsecs();
-		burn_nsecs(0);
+		burn_nsecs(sched, 0);
 		T1 = get_nsecs();
 		delta = T1-T0;
 		min_delta = min(min_delta, delta);
 	}
-	run_measurement_overhead = min_delta;
+	sched->run_measurement_overhead = min_delta;
 
 	printf("run measurement overhead: %" PRIu64 " nsecs\n", min_delta);
 }
 
-static void calibrate_sleep_measurement_overhead(void)
+static void calibrate_sleep_measurement_overhead(struct perf_sched *sched)
 {
 	u64 T0, T1, delta, min_delta = 1000000000ULL;
 	int i;
@@ -218,7 +229,7 @@ static void calibrate_sleep_measurement_overhead(void)
 		min_delta = min(min_delta, delta);
 	}
 	min_delta -= 10000;
-	sleep_measurement_overhead = min_delta;
+	sched->sleep_measurement_overhead = min_delta;
 
 	printf("sleep measurement overhead: %" PRIu64 " nsecs\n", min_delta);
 }
@@ -251,8 +262,8 @@ static struct sched_atom *last_event(struct task_desc *task)
 	return task->atoms[task->nr_events - 1];
 }
 
-static void
-add_sched_event_run(struct task_desc *task, u64 timestamp, u64 duration)
+static void add_sched_event_run(struct perf_sched *sched, struct task_desc *task,
+				u64 timestamp, u64 duration)
 {
 	struct sched_atom *event, *curr_event = last_event(task);
 
@@ -261,7 +272,7 @@ add_sched_event_run(struct task_desc *task, u64 timestamp, u64 duration)
 	 * to it:
 	 */
 	if (curr_event && curr_event->type == SCHED_EVENT_RUN) {
-		nr_run_events_optimized++;
+		sched->nr_run_events_optimized++;
 		curr_event->duration += duration;
 		return;
 	}
@@ -271,12 +282,11 @@ add_sched_event_run(struct task_desc *task, u64 timestamp, u64 duration)
 	event->type = SCHED_EVENT_RUN;
 	event->duration = duration;
 
-	nr_run_events++;
+	sched->nr_run_events++;
 }
 
-static void
-add_sched_event_wakeup(struct task_desc *task, u64 timestamp,
-		       struct task_desc *wakee)
+static void add_sched_event_wakeup(struct perf_sched *sched, struct task_desc *task,
+				   u64 timestamp, struct task_desc *wakee)
 {
 	struct sched_atom *event, *wakee_event;
 
@@ -286,11 +296,11 @@ add_sched_event_wakeup(struct task_desc *task, u64 timestamp,
 
 	wakee_event = last_event(wakee);
 	if (!wakee_event || wakee_event->type != SCHED_EVENT_SLEEP) {
-		targetless_wakeups++;
+		sched->targetless_wakeups++;
 		return;
 	}
 	if (wakee_event->wait_sem) {
-		multitarget_wakeups++;
+		sched->multitarget_wakeups++;
 		return;
 	}
 
@@ -299,89 +309,89 @@ add_sched_event_wakeup(struct task_desc *task, u64 timestamp,
 	wakee_event->specific_wait = 1;
 	event->wait_sem = wakee_event->wait_sem;
 
-	nr_wakeup_events++;
+	sched->nr_wakeup_events++;
 }
 
-static void
-add_sched_event_sleep(struct task_desc *task, u64 timestamp,
-		      u64 task_state __used)
+static void add_sched_event_sleep(struct perf_sched *sched, struct task_desc *task,
+				  u64 timestamp, u64 task_state __maybe_unused)
 {
 	struct sched_atom *event = get_new_event(task, timestamp);
 
 	event->type = SCHED_EVENT_SLEEP;
 
-	nr_sleep_events++;
+	sched->nr_sleep_events++;
 }
 
-static struct task_desc *register_pid(unsigned long pid, const char *comm)
+static struct task_desc *register_pid(struct perf_sched *sched,
+				      unsigned long pid, const char *comm)
 {
 	struct task_desc *task;
 
 	BUG_ON(pid >= MAX_PID);
 
-	task = pid_to_task[pid];
+	task = sched->pid_to_task[pid];
 
 	if (task)
 		return task;
 
 	task = zalloc(sizeof(*task));
 	task->pid = pid;
-	task->nr = nr_tasks;
+	task->nr = sched->nr_tasks;
 	strcpy(task->comm, comm);
 	/*
 	 * every task starts in sleeping state - this gets ignored
 	 * if there's no wakeup pointing to this sleep state:
 	 */
-	add_sched_event_sleep(task, 0, 0);
+	add_sched_event_sleep(sched, task, 0, 0);
 
-	pid_to_task[pid] = task;
-	nr_tasks++;
-	tasks = realloc(tasks, nr_tasks*sizeof(struct task_task *));
-	BUG_ON(!tasks);
-	tasks[task->nr] = task;
+	sched->pid_to_task[pid] = task;
+	sched->nr_tasks++;
+	sched->tasks = realloc(sched->tasks, sched->nr_tasks * sizeof(struct task_task *));
+	BUG_ON(!sched->tasks);
+	sched->tasks[task->nr] = task;
 
 	if (verbose)
-		printf("registered task #%ld, PID %ld (%s)\n", nr_tasks, pid, comm);
+		printf("registered task #%ld, PID %ld (%s)\n", sched->nr_tasks, pid, comm);
 
 	return task;
 }
 
 
-static void print_task_traces(void)
+static void print_task_traces(struct perf_sched *sched)
 {
 	struct task_desc *task;
 	unsigned long i;
 
-	for (i = 0; i < nr_tasks; i++) {
-		task = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		task = sched->tasks[i];
 		printf("task %6ld (%20s:%10ld), nr_events: %ld\n",
 			task->nr, task->comm, task->pid, task->nr_events);
 	}
 }
 
-static void add_cross_task_wakeups(void)
+static void add_cross_task_wakeups(struct perf_sched *sched)
 {
 	struct task_desc *task1, *task2;
 	unsigned long i, j;
 
-	for (i = 0; i < nr_tasks; i++) {
-		task1 = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		task1 = sched->tasks[i];
 		j = i + 1;
-		if (j == nr_tasks)
+		if (j == sched->nr_tasks)
 			j = 0;
-		task2 = tasks[j];
-		add_sched_event_wakeup(task1, 0, task2);
+		task2 = sched->tasks[j];
+		add_sched_event_wakeup(sched, task1, 0, task2);
 	}
 }
 
-static void
-process_sched_event(struct task_desc *this_task __used, struct sched_atom *atom)
+static void perf_sched__process_event(struct perf_sched *sched,
+				      struct sched_atom *atom)
 {
 	int ret = 0;
 
 	switch (atom->type) {
 		case SCHED_EVENT_RUN:
-			burn_nsecs(atom->duration);
+			burn_nsecs(sched, atom->duration);
 			break;
 		case SCHED_EVENT_SLEEP:
 			if (atom->wait_sem)
@@ -428,8 +438,8 @@ static int self_open_counters(void)
 	fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
 
 	if (fd < 0)
-		die("Error: sys_perf_event_open() syscall returned"
-		    "with %d (%s)\n", fd, strerror(errno));
+		pr_err("Error: sys_perf_event_open() syscall returned "
+		       "with %d (%s)\n", fd, strerror(errno));
 	return fd;
 }
 
@@ -444,31 +454,41 @@ static u64 get_cpu_usage_nsec_self(int fd)
 	return runtime;
 }
 
+struct sched_thread_parms {
+	struct task_desc  *task;
+	struct perf_sched *sched;
+};
+
 static void *thread_func(void *ctx)
 {
-	struct task_desc *this_task = ctx;
+	struct sched_thread_parms *parms = ctx;
+	struct task_desc *this_task = parms->task;
+	struct perf_sched *sched = parms->sched;
 	u64 cpu_usage_0, cpu_usage_1;
 	unsigned long i, ret;
 	char comm2[22];
 	int fd;
 
+	free(parms);
+
 	sprintf(comm2, ":%s", this_task->comm);
 	prctl(PR_SET_NAME, comm2);
 	fd = self_open_counters();
-
+	if (fd < 0)
+		return NULL;
 again:
 	ret = sem_post(&this_task->ready_for_work);
 	BUG_ON(ret);
-	ret = pthread_mutex_lock(&start_work_mutex);
+	ret = pthread_mutex_lock(&sched->start_work_mutex);
 	BUG_ON(ret);
-	ret = pthread_mutex_unlock(&start_work_mutex);
+	ret = pthread_mutex_unlock(&sched->start_work_mutex);
 	BUG_ON(ret);
 
 	cpu_usage_0 = get_cpu_usage_nsec_self(fd);
 
 	for (i = 0; i < this_task->nr_events; i++) {
 		this_task->curr_event = i;
-		process_sched_event(this_task, this_task->atoms[i]);
+		perf_sched__process_event(sched, this_task->atoms[i]);
 	}
 
 	cpu_usage_1 = get_cpu_usage_nsec_self(fd);
@@ -476,15 +496,15 @@ again:
 	ret = sem_post(&this_task->work_done_sem);
 	BUG_ON(ret);
 
-	ret = pthread_mutex_lock(&work_done_wait_mutex);
+	ret = pthread_mutex_lock(&sched->work_done_wait_mutex);
 	BUG_ON(ret);
-	ret = pthread_mutex_unlock(&work_done_wait_mutex);
+	ret = pthread_mutex_unlock(&sched->work_done_wait_mutex);
 	BUG_ON(ret);
 
 	goto again;
 }
 
-static void create_tasks(void)
+static void create_tasks(struct perf_sched *sched)
 {
 	struct task_desc *task;
 	pthread_attr_t attr;
@@ -496,128 +516,129 @@ static void create_tasks(void)
 	err = pthread_attr_setstacksize(&attr,
 			(size_t) max(16 * 1024, PTHREAD_STACK_MIN));
 	BUG_ON(err);
-	err = pthread_mutex_lock(&start_work_mutex);
+	err = pthread_mutex_lock(&sched->start_work_mutex);
 	BUG_ON(err);
-	err = pthread_mutex_lock(&work_done_wait_mutex);
+	err = pthread_mutex_lock(&sched->work_done_wait_mutex);
 	BUG_ON(err);
-	for (i = 0; i < nr_tasks; i++) {
-		task = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		struct sched_thread_parms *parms = malloc(sizeof(*parms));
+		BUG_ON(parms == NULL);
+		parms->task = task = sched->tasks[i];
+		parms->sched = sched;
 		sem_init(&task->sleep_sem, 0, 0);
 		sem_init(&task->ready_for_work, 0, 0);
 		sem_init(&task->work_done_sem, 0, 0);
 		task->curr_event = 0;
-		err = pthread_create(&task->thread, &attr, thread_func, task);
+		err = pthread_create(&task->thread, &attr, thread_func, parms);
 		BUG_ON(err);
 	}
 }
 
-static void wait_for_tasks(void)
+static void wait_for_tasks(struct perf_sched *sched)
 {
 	u64 cpu_usage_0, cpu_usage_1;
 	struct task_desc *task;
 	unsigned long i, ret;
 
-	start_time = get_nsecs();
-	cpu_usage = 0;
-	pthread_mutex_unlock(&work_done_wait_mutex);
+	sched->start_time = get_nsecs();
+	sched->cpu_usage = 0;
+	pthread_mutex_unlock(&sched->work_done_wait_mutex);
 
-	for (i = 0; i < nr_tasks; i++) {
-		task = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		task = sched->tasks[i];
 		ret = sem_wait(&task->ready_for_work);
 		BUG_ON(ret);
 		sem_init(&task->ready_for_work, 0, 0);
 	}
-	ret = pthread_mutex_lock(&work_done_wait_mutex);
+	ret = pthread_mutex_lock(&sched->work_done_wait_mutex);
 	BUG_ON(ret);
 
 	cpu_usage_0 = get_cpu_usage_nsec_parent();
 
-	pthread_mutex_unlock(&start_work_mutex);
+	pthread_mutex_unlock(&sched->start_work_mutex);
 
-	for (i = 0; i < nr_tasks; i++) {
-		task = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		task = sched->tasks[i];
 		ret = sem_wait(&task->work_done_sem);
 		BUG_ON(ret);
 		sem_init(&task->work_done_sem, 0, 0);
-		cpu_usage += task->cpu_usage;
+		sched->cpu_usage += task->cpu_usage;
 		task->cpu_usage = 0;
 	}
 
 	cpu_usage_1 = get_cpu_usage_nsec_parent();
-	if (!runavg_cpu_usage)
-		runavg_cpu_usage = cpu_usage;
-	runavg_cpu_usage = (runavg_cpu_usage*9 + cpu_usage)/10;
+	if (!sched->runavg_cpu_usage)
+		sched->runavg_cpu_usage = sched->cpu_usage;
+	sched->runavg_cpu_usage = (sched->runavg_cpu_usage * 9 + sched->cpu_usage) / 10;
 
-	parent_cpu_usage = cpu_usage_1 - cpu_usage_0;
-	if (!runavg_parent_cpu_usage)
-		runavg_parent_cpu_usage = parent_cpu_usage;
-	runavg_parent_cpu_usage = (runavg_parent_cpu_usage*9 +
-				   parent_cpu_usage)/10;
+	sched->parent_cpu_usage = cpu_usage_1 - cpu_usage_0;
+	if (!sched->runavg_parent_cpu_usage)
+		sched->runavg_parent_cpu_usage = sched->parent_cpu_usage;
+	sched->runavg_parent_cpu_usage = (sched->runavg_parent_cpu_usage * 9 +
+					 sched->parent_cpu_usage)/10;
 
-	ret = pthread_mutex_lock(&start_work_mutex);
+	ret = pthread_mutex_lock(&sched->start_work_mutex);
 	BUG_ON(ret);
 
-	for (i = 0; i < nr_tasks; i++) {
-		task = tasks[i];
+	for (i = 0; i < sched->nr_tasks; i++) {
+		task = sched->tasks[i];
 		sem_init(&task->sleep_sem, 0, 0);
 		task->curr_event = 0;
 	}
 }
 
-static void run_one_test(void)
+static void run_one_test(struct perf_sched *sched)
 {
 	u64 T0, T1, delta, avg_delta, fluct;
 
 	T0 = get_nsecs();
-	wait_for_tasks();
+	wait_for_tasks(sched);
 	T1 = get_nsecs();
 
 	delta = T1 - T0;
-	sum_runtime += delta;
-	nr_runs++;
+	sched->sum_runtime += delta;
+	sched->nr_runs++;
 
-	avg_delta = sum_runtime / nr_runs;
+	avg_delta = sched->sum_runtime / sched->nr_runs;
 	if (delta < avg_delta)
 		fluct = avg_delta - delta;
 	else
 		fluct = delta - avg_delta;
-	sum_fluct += fluct;
-	if (!run_avg)
-		run_avg = delta;
-	run_avg = (run_avg*9 + delta)/10;
+	sched->sum_fluct += fluct;
+	if (!sched->run_avg)
+		sched->run_avg = delta;
+	sched->run_avg = (sched->run_avg * 9 + delta) / 10;
 
-	printf("#%-3ld: %0.3f, ",
-		nr_runs, (double)delta/1000000.0);
+	printf("#%-3ld: %0.3f, ", sched->nr_runs, (double)delta / 1000000.0);
 
-	printf("ravg: %0.2f, ",
-		(double)run_avg/1e6);
+	printf("ravg: %0.2f, ", (double)sched->run_avg / 1e6);
 
 	printf("cpu: %0.2f / %0.2f",
-		(double)cpu_usage/1e6, (double)runavg_cpu_usage/1e6);
+		(double)sched->cpu_usage / 1e6, (double)sched->runavg_cpu_usage / 1e6);
 
 #if 0
 	/*
 	 * rusage statistics done by the parent, these are less
-	 * accurate than the sum_exec_runtime based statistics:
+	 * accurate than the sched->sum_exec_runtime based statistics:
 	 */
 	printf(" [%0.2f / %0.2f]",
-		(double)parent_cpu_usage/1e6,
-		(double)runavg_parent_cpu_usage/1e6);
+		(double)sched->parent_cpu_usage/1e6,
+		(double)sched->runavg_parent_cpu_usage/1e6);
 #endif
 
 	printf("\n");
 
-	if (nr_sleep_corrections)
-		printf(" (%ld sleep corrections)\n", nr_sleep_corrections);
-	nr_sleep_corrections = 0;
+	if (sched->nr_sleep_corrections)
+		printf(" (%ld sleep corrections)\n", sched->nr_sleep_corrections);
+	sched->nr_sleep_corrections = 0;
 }
 
-static void test_calibrations(void)
+static void test_calibrations(struct perf_sched *sched)
 {
 	u64 T0, T1;
 
 	T0 = get_nsecs();
-	burn_nsecs(1e6);
+	burn_nsecs(sched, 1e6);
 	T1 = get_nsecs();
 
 	printf("the run test took %" PRIu64 " nsecs\n", T1 - T0);
@@ -629,236 +650,92 @@ static void test_calibrations(void)
 	printf("the sleep test took %" PRIu64 " nsecs\n", T1 - T0);
 }
 
-#define FILL_FIELD(ptr, field, event, data)	\
-	ptr.field = (typeof(ptr.field)) raw_field_value(event, #field, data)
-
-#define FILL_ARRAY(ptr, array, event, data)			\
-do {								\
-	void *__array = raw_field_ptr(event, #array, data);	\
-	memcpy(ptr.array, __array, sizeof(ptr.array));	\
-} while(0)
-
-#define FILL_COMMON_FIELDS(ptr, event, data)			\
-do {								\
-	FILL_FIELD(ptr, common_type, event, data);		\
-	FILL_FIELD(ptr, common_flags, event, data);		\
-	FILL_FIELD(ptr, common_preempt_count, event, data);	\
-	FILL_FIELD(ptr, common_pid, event, data);		\
-	FILL_FIELD(ptr, common_tgid, event, data);		\
-} while (0)
-
-
-
-struct trace_switch_event {
-	u32 size;
-
-	u16 common_type;
-	u8 common_flags;
-	u8 common_preempt_count;
-	u32 common_pid;
-	u32 common_tgid;
-
-	char prev_comm[16];
-	u32 prev_pid;
-	u32 prev_prio;
-	u64 prev_state;
-	char next_comm[16];
-	u32 next_pid;
-	u32 next_prio;
-};
-
-struct trace_runtime_event {
-	u32 size;
-
-	u16 common_type;
-	u8 common_flags;
-	u8 common_preempt_count;
-	u32 common_pid;
-	u32 common_tgid;
-
-	char comm[16];
-	u32 pid;
-	u64 runtime;
-	u64 vruntime;
-};
-
-struct trace_wakeup_event {
-	u32 size;
-
-	u16 common_type;
-	u8 common_flags;
-	u8 common_preempt_count;
-	u32 common_pid;
-	u32 common_tgid;
-
-	char comm[16];
-	u32 pid;
-
-	u32 prio;
-	u32 success;
-	u32 cpu;
-};
-
-struct trace_fork_event {
-	u32 size;
-
-	u16 common_type;
-	u8 common_flags;
-	u8 common_preempt_count;
-	u32 common_pid;
-	u32 common_tgid;
-
-	char parent_comm[16];
-	u32 parent_pid;
-	char child_comm[16];
-	u32 child_pid;
-};
-
-struct trace_migrate_task_event {
-	u32 size;
-
-	u16 common_type;
-	u8 common_flags;
-	u8 common_preempt_count;
-	u32 common_pid;
-	u32 common_tgid;
-
-	char comm[16];
-	u32 pid;
-
-	u32 prio;
-	u32 cpu;
-};
-
-struct trace_sched_handler {
-	void (*switch_event)(struct trace_switch_event *,
-			     struct machine *,
-			     struct event_format *,
-			     int cpu,
-			     u64 timestamp,
-			     struct thread *thread);
-
-	void (*runtime_event)(struct trace_runtime_event *,
-			      struct machine *,
-			      struct event_format *,
-			      int cpu,
-			      u64 timestamp,
-			      struct thread *thread);
-
-	void (*wakeup_event)(struct trace_wakeup_event *,
-			     struct machine *,
-			     struct event_format *,
-			     int cpu,
-			     u64 timestamp,
-			     struct thread *thread);
-
-	void (*fork_event)(struct trace_fork_event *,
-			   struct event_format *,
-			   int cpu,
-			   u64 timestamp,
-			   struct thread *thread);
-
-	void (*migrate_task_event)(struct trace_migrate_task_event *,
-			   struct machine *machine,
-			   struct event_format *,
-			   int cpu,
-			   u64 timestamp,
-			   struct thread *thread);
-};
-
-
-static void
-replay_wakeup_event(struct trace_wakeup_event *wakeup_event,
-		    struct machine *machine __used,
-		    struct event_format *event,
-		    int cpu __used,
-		    u64 timestamp __used,
-		    struct thread *thread __used)
+static int
+replay_wakeup_event(struct perf_sched *sched,
+		    struct perf_evsel *evsel, struct perf_sample *sample,
+		    struct machine *machine __maybe_unused)
 {
+	const char *comm = perf_evsel__strval(evsel, sample, "comm");
+	const u32 pid	 = perf_evsel__intval(evsel, sample, "pid");
 	struct task_desc *waker, *wakee;
 
 	if (verbose) {
-		printf("sched_wakeup event %p\n", event);
+		printf("sched_wakeup event %p\n", evsel);
 
-		printf(" ... pid %d woke up %s/%d\n",
-			wakeup_event->common_pid,
-			wakeup_event->comm,
-			wakeup_event->pid);
+		printf(" ... pid %d woke up %s/%d\n", sample->tid, comm, pid);
 	}
 
-	waker = register_pid(wakeup_event->common_pid, "<unknown>");
-	wakee = register_pid(wakeup_event->pid, wakeup_event->comm);
+	waker = register_pid(sched, sample->tid, "<unknown>");
+	wakee = register_pid(sched, pid, comm);
 
-	add_sched_event_wakeup(waker, timestamp, wakee);
+	add_sched_event_wakeup(sched, waker, sample->time, wakee);
+	return 0;
 }
 
-static u64 cpu_last_switched[MAX_CPUS];
-
-static void
-replay_switch_event(struct trace_switch_event *switch_event,
-		    struct machine *machine __used,
-		    struct event_format *event,
-		    int cpu,
-		    u64 timestamp,
-		    struct thread *thread __used)
+static int replay_switch_event(struct perf_sched *sched,
+			       struct perf_evsel *evsel,
+			       struct perf_sample *sample,
+			       struct machine *machine __maybe_unused)
 {
-	struct task_desc *prev, __used *next;
-	u64 timestamp0;
+	const char *prev_comm  = perf_evsel__strval(evsel, sample, "prev_comm"),
+		   *next_comm  = perf_evsel__strval(evsel, sample, "next_comm");
+	const u32 prev_pid = perf_evsel__intval(evsel, sample, "prev_pid"),
+		  next_pid = perf_evsel__intval(evsel, sample, "next_pid");
+	const u64 prev_state = perf_evsel__intval(evsel, sample, "prev_state");
+	struct task_desc *prev, __maybe_unused *next;
+	u64 timestamp0, timestamp = sample->time;
+	int cpu = sample->cpu;
 	s64 delta;
 
 	if (verbose)
-		printf("sched_switch event %p\n", event);
+		printf("sched_switch event %p\n", evsel);
 
 	if (cpu >= MAX_CPUS || cpu < 0)
-		return;
+		return 0;
 
-	timestamp0 = cpu_last_switched[cpu];
+	timestamp0 = sched->cpu_last_switched[cpu];
 	if (timestamp0)
 		delta = timestamp - timestamp0;
 	else
 		delta = 0;
 
-	if (delta < 0)
-		die("hm, delta: %" PRIu64 " < 0 ?\n", delta);
-
-	if (verbose) {
-		printf(" ... switch from %s/%d to %s/%d [ran %" PRIu64 " nsecs]\n",
-			switch_event->prev_comm, switch_event->prev_pid,
-			switch_event->next_comm, switch_event->next_pid,
-			delta);
+	if (delta < 0) {
+		pr_err("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+		return -1;
 	}
 
-	prev = register_pid(switch_event->prev_pid, switch_event->prev_comm);
-	next = register_pid(switch_event->next_pid, switch_event->next_comm);
+	pr_debug(" ... switch from %s/%d to %s/%d [ran %" PRIu64 " nsecs]\n",
+		 prev_comm, prev_pid, next_comm, next_pid, delta);
 
-	cpu_last_switched[cpu] = timestamp;
+	prev = register_pid(sched, prev_pid, prev_comm);
+	next = register_pid(sched, next_pid, next_comm);
 
-	add_sched_event_run(prev, timestamp, delta);
-	add_sched_event_sleep(prev, timestamp, switch_event->prev_state);
-}
+	sched->cpu_last_switched[cpu] = timestamp;
 
+	add_sched_event_run(sched, prev, timestamp, delta);
+	add_sched_event_sleep(sched, prev, timestamp, prev_state);
 
-static void
-replay_fork_event(struct trace_fork_event *fork_event,
-		  struct event_format *event,
-		  int cpu __used,
-		  u64 timestamp __used,
-		  struct thread *thread __used)
+	return 0;
+}
+
+static int replay_fork_event(struct perf_sched *sched, struct perf_evsel *evsel,
+			     struct perf_sample *sample)
 {
+	const char *parent_comm = perf_evsel__strval(evsel, sample, "parent_comm"),
+		   *child_comm  = perf_evsel__strval(evsel, sample, "child_comm");
+	const u32 parent_pid  = perf_evsel__intval(evsel, sample, "parent_pid"),
+		  child_pid  = perf_evsel__intval(evsel, sample, "child_pid");
+
 	if (verbose) {
-		printf("sched_fork event %p\n", event);
-		printf("... parent: %s/%d\n", fork_event->parent_comm, fork_event->parent_pid);
-		printf("...  child: %s/%d\n", fork_event->child_comm, fork_event->child_pid);
+		printf("sched_fork event %p\n", evsel);
+		printf("... parent: %s/%d\n", parent_comm, parent_pid);
+		printf("...  child: %s/%d\n", child_comm, child_pid);
 	}
-	register_pid(fork_event->parent_pid, fork_event->parent_comm);
-	register_pid(fork_event->child_pid, fork_event->child_comm);
-}
 
-static struct trace_sched_handler replay_ops  = {
-	.wakeup_event		= replay_wakeup_event,
-	.switch_event		= replay_switch_event,
-	.fork_event		= replay_fork_event,
-};
+	register_pid(sched, parent_pid, parent_comm);
+	register_pid(sched, child_pid, child_comm);
+	return 0;
+}
 
 struct sort_dimension {
 	const char		*name;
@@ -866,8 +743,6 @@ struct sort_dimension {
 	struct list_head	list;
 };
 
-static LIST_HEAD(cmp_pid);
-
 static int
 thread_lat_cmp(struct list_head *list, struct work_atoms *l, struct work_atoms *r)
 {
@@ -936,43 +811,45 @@ __thread_latency_insert(struct rb_root *root, struct work_atoms *data,
 	rb_insert_color(&data->node, root);
 }
 
-static void thread_atoms_insert(struct thread *thread)
+static int thread_atoms_insert(struct perf_sched *sched, struct thread *thread)
 {
 	struct work_atoms *atoms = zalloc(sizeof(*atoms));
-	if (!atoms)
-		die("No memory");
+	if (!atoms) {
+		pr_err("No memory at %s\n", __func__);
+		return -1;
+	}
 
 	atoms->thread = thread;
 	INIT_LIST_HEAD(&atoms->work_list);
-	__thread_latency_insert(&atom_root, atoms, &cmp_pid);
+	__thread_latency_insert(&sched->atom_root, atoms, &sched->cmp_pid);
+	return 0;
 }
 
-static void
-latency_fork_event(struct trace_fork_event *fork_event __used,
-		   struct event_format *event __used,
-		   int cpu __used,
-		   u64 timestamp __used,
-		   struct thread *thread __used)
+static int latency_fork_event(struct perf_sched *sched __maybe_unused,
+			      struct perf_evsel *evsel __maybe_unused,
+			      struct perf_sample *sample __maybe_unused)
 {
 	/* should insert the newcomer */
+	return 0;
 }
 
-__used
-static char sched_out_state(struct trace_switch_event *switch_event)
+static char sched_out_state(u64 prev_state)
 {
 	const char *str = TASK_STATE_TO_CHAR_STR;
 
-	return str[switch_event->prev_state];
+	return str[prev_state];
 }
 
-static void
+static int
 add_sched_out_event(struct work_atoms *atoms,
 		    char run_state,
 		    u64 timestamp)
 {
 	struct work_atom *atom = zalloc(sizeof(*atom));
-	if (!atom)
-		die("Non memory");
+	if (!atom) {
+		pr_err("Non memory at %s", __func__);
+		return -1;
+	}
 
 	atom->sched_out_time = timestamp;
 
@@ -982,10 +859,12 @@ add_sched_out_event(struct work_atoms *atoms,
 	}
 
 	list_add_tail(&atom->list, &atoms->work_list);
+	return 0;
 }
 
 static void
-add_runtime_event(struct work_atoms *atoms, u64 delta, u64 timestamp __used)
+add_runtime_event(struct work_atoms *atoms, u64 delta,
+		  u64 timestamp __maybe_unused)
 {
 	struct work_atom *atom;
 
@@ -1028,106 +907,128 @@ add_sched_in_event(struct work_atoms *atoms, u64 timestamp)
 	atoms->nb_atoms++;
 }
 
-static void
-latency_switch_event(struct trace_switch_event *switch_event,
-		     struct machine *machine,
-		     struct event_format *event __used,
-		     int cpu,
-		     u64 timestamp,
-		     struct thread *thread __used)
+static int latency_switch_event(struct perf_sched *sched,
+				struct perf_evsel *evsel,
+				struct perf_sample *sample,
+				struct machine *machine)
 {
+	const u32 prev_pid = perf_evsel__intval(evsel, sample, "prev_pid"),
+		  next_pid = perf_evsel__intval(evsel, sample, "next_pid");
+	const u64 prev_state = perf_evsel__intval(evsel, sample, "prev_state");
 	struct work_atoms *out_events, *in_events;
 	struct thread *sched_out, *sched_in;
-	u64 timestamp0;
+	u64 timestamp0, timestamp = sample->time;
+	int cpu = sample->cpu;
 	s64 delta;
 
 	BUG_ON(cpu >= MAX_CPUS || cpu < 0);
 
-	timestamp0 = cpu_last_switched[cpu];
-	cpu_last_switched[cpu] = timestamp;
+	timestamp0 = sched->cpu_last_switched[cpu];
+	sched->cpu_last_switched[cpu] = timestamp;
 	if (timestamp0)
 		delta = timestamp - timestamp0;
 	else
 		delta = 0;
 
-	if (delta < 0)
-		die("hm, delta: %" PRIu64 " < 0 ?\n", delta);
-
+	if (delta < 0) {
+		pr_err("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+		return -1;
+	}
 
-	sched_out = machine__findnew_thread(machine, switch_event->prev_pid);
-	sched_in = machine__findnew_thread(machine, switch_event->next_pid);
+	sched_out = machine__findnew_thread(machine, prev_pid);
+	sched_in = machine__findnew_thread(machine, next_pid);
 
-	out_events = thread_atoms_search(&atom_root, sched_out, &cmp_pid);
+	out_events = thread_atoms_search(&sched->atom_root, sched_out, &sched->cmp_pid);
 	if (!out_events) {
-		thread_atoms_insert(sched_out);
-		out_events = thread_atoms_search(&atom_root, sched_out, &cmp_pid);
-		if (!out_events)
-			die("out-event: Internal tree error");
+		if (thread_atoms_insert(sched, sched_out))
+			return -1;
+		out_events = thread_atoms_search(&sched->atom_root, sched_out, &sched->cmp_pid);
+		if (!out_events) {
+			pr_err("out-event: Internal tree error");
+			return -1;
+		}
 	}
-	add_sched_out_event(out_events, sched_out_state(switch_event), timestamp);
+	if (add_sched_out_event(out_events, sched_out_state(prev_state), timestamp))
+		return -1;
 
-	in_events = thread_atoms_search(&atom_root, sched_in, &cmp_pid);
+	in_events = thread_atoms_search(&sched->atom_root, sched_in, &sched->cmp_pid);
 	if (!in_events) {
-		thread_atoms_insert(sched_in);
-		in_events = thread_atoms_search(&atom_root, sched_in, &cmp_pid);
-		if (!in_events)
-			die("in-event: Internal tree error");
+		if (thread_atoms_insert(sched, sched_in))
+			return -1;
+		in_events = thread_atoms_search(&sched->atom_root, sched_in, &sched->cmp_pid);
+		if (!in_events) {
+			pr_err("in-event: Internal tree error");
+			return -1;
+		}
 		/*
 		 * Take came in we have not heard about yet,
 		 * add in an initial atom in runnable state:
 		 */
-		add_sched_out_event(in_events, 'R', timestamp);
+		if (add_sched_out_event(in_events, 'R', timestamp))
+			return -1;
 	}
 	add_sched_in_event(in_events, timestamp);
+
+	return 0;
 }
 
-static void
-latency_runtime_event(struct trace_runtime_event *runtime_event,
-		     struct machine *machine,
-		     struct event_format *event __used,
-		     int cpu,
-		     u64 timestamp,
-		     struct thread *this_thread __used)
+static int latency_runtime_event(struct perf_sched *sched,
+				 struct perf_evsel *evsel,
+				 struct perf_sample *sample,
+				 struct machine *machine)
 {
-	struct thread *thread = machine__findnew_thread(machine, runtime_event->pid);
-	struct work_atoms *atoms = thread_atoms_search(&atom_root, thread, &cmp_pid);
+	const u32 pid	   = perf_evsel__intval(evsel, sample, "pid");
+	const u64 runtime  = perf_evsel__intval(evsel, sample, "runtime");
+	struct thread *thread = machine__findnew_thread(machine, pid);
+	struct work_atoms *atoms = thread_atoms_search(&sched->atom_root, thread, &sched->cmp_pid);
+	u64 timestamp = sample->time;
+	int cpu = sample->cpu;
 
 	BUG_ON(cpu >= MAX_CPUS || cpu < 0);
 	if (!atoms) {
-		thread_atoms_insert(thread);
-		atoms = thread_atoms_search(&atom_root, thread, &cmp_pid);
-		if (!atoms)
-			die("in-event: Internal tree error");
-		add_sched_out_event(atoms, 'R', timestamp);
+		if (thread_atoms_insert(sched, thread))
+			return -1;
+		atoms = thread_atoms_search(&sched->atom_root, thread, &sched->cmp_pid);
+		if (!atoms) {
+			pr_err("in-event: Internal tree error");
+			return -1;
+		}
+		if (add_sched_out_event(atoms, 'R', timestamp))
+			return -1;
 	}
 
-	add_runtime_event(atoms, runtime_event->runtime, timestamp);
+	add_runtime_event(atoms, runtime, timestamp);
+	return 0;
 }
 
-static void
-latency_wakeup_event(struct trace_wakeup_event *wakeup_event,
-		     struct machine *machine,
-		     struct event_format *__event __used,
-		     int cpu __used,
-		     u64 timestamp,
-		     struct thread *thread __used)
+static int latency_wakeup_event(struct perf_sched *sched,
+				struct perf_evsel *evsel,
+				struct perf_sample *sample,
+				struct machine *machine)
 {
+	const u32 pid	  = perf_evsel__intval(evsel, sample, "pid"),
+		  success = perf_evsel__intval(evsel, sample, "success");
 	struct work_atoms *atoms;
 	struct work_atom *atom;
 	struct thread *wakee;
+	u64 timestamp = sample->time;
 
 	/* Note for later, it may be interesting to observe the failing cases */
-	if (!wakeup_event->success)
-		return;
+	if (!success)
+		return 0;
 
-	wakee = machine__findnew_thread(machine, wakeup_event->pid);
-	atoms = thread_atoms_search(&atom_root, wakee, &cmp_pid);
+	wakee = machine__findnew_thread(machine, pid);
+	atoms = thread_atoms_search(&sched->atom_root, wakee, &sched->cmp_pid);
 	if (!atoms) {
-		thread_atoms_insert(wakee);
-		atoms = thread_atoms_search(&atom_root, wakee, &cmp_pid);
-		if (!atoms)
-			die("wakeup-event: Internal tree error");
-		add_sched_out_event(atoms, 'S', timestamp);
+		if (thread_atoms_insert(sched, wakee))
+			return -1;
+		atoms = thread_atoms_search(&sched->atom_root, wakee, &sched->cmp_pid);
+		if (!atoms) {
+			pr_err("wakeup-event: Internal tree error");
+			return -1;
+		}
+		if (add_sched_out_event(atoms, 'S', timestamp))
+			return -1;
 	}
 
 	BUG_ON(list_empty(&atoms->work_list));
@@ -1139,27 +1040,27 @@ latency_wakeup_event(struct trace_wakeup_event *wakeup_event,
 	 * one CPU, or are only looking at only one, so don't
 	 * make useless noise.
 	 */
-	if (profile_cpu == -1 && atom->state != THREAD_SLEEPING)
-		nr_state_machine_bugs++;
+	if (sched->profile_cpu == -1 && atom->state != THREAD_SLEEPING)
+		sched->nr_state_machine_bugs++;
 
-	nr_timestamps++;
+	sched->nr_timestamps++;
 	if (atom->sched_out_time > timestamp) {
-		nr_unordered_timestamps++;
-		return;
+		sched->nr_unordered_timestamps++;
+		return 0;
 	}
 
 	atom->state = THREAD_WAIT_CPU;
 	atom->wake_up_time = timestamp;
+	return 0;
 }
 
-static void
-latency_migrate_task_event(struct trace_migrate_task_event *migrate_task_event,
-		     struct machine *machine,
-		     struct event_format *__event __used,
-		     int cpu __used,
-		     u64 timestamp,
-		     struct thread *thread __used)
+static int latency_migrate_task_event(struct perf_sched *sched,
+				      struct perf_evsel *evsel,
+				      struct perf_sample *sample,
+				      struct machine *machine)
 {
+	const u32 pid = perf_evsel__intval(evsel, sample, "pid");
+	u64 timestamp = sample->time;
 	struct work_atoms *atoms;
 	struct work_atom *atom;
 	struct thread *migrant;
@@ -1167,18 +1068,22 @@ latency_migrate_task_event(struct trace_migrate_task_event *migrate_task_event,
 	/*
 	 * Only need to worry about migration when profiling one CPU.
 	 */
-	if (profile_cpu == -1)
-		return;
+	if (sched->profile_cpu == -1)
+		return 0;
 
-	migrant = machine__findnew_thread(machine, migrate_task_event->pid);
-	atoms = thread_atoms_search(&atom_root, migrant, &cmp_pid);
+	migrant = machine__findnew_thread(machine, pid);
+	atoms = thread_atoms_search(&sched->atom_root, migrant, &sched->cmp_pid);
 	if (!atoms) {
-		thread_atoms_insert(migrant);
-		register_pid(migrant->pid, migrant->comm);
-		atoms = thread_atoms_search(&atom_root, migrant, &cmp_pid);
-		if (!atoms)
-			die("migration-event: Internal tree error");
-		add_sched_out_event(atoms, 'R', timestamp);
+		if (thread_atoms_insert(sched, migrant))
+			return -1;
+		register_pid(sched, migrant->pid, migrant->comm);
+		atoms = thread_atoms_search(&sched->atom_root, migrant, &sched->cmp_pid);
+		if (!atoms) {
+			pr_err("migration-event: Internal tree error");
+			return -1;
+		}
+		if (add_sched_out_event(atoms, 'R', timestamp))
+			return -1;
 	}
 
 	BUG_ON(list_empty(&atoms->work_list));
@@ -1186,21 +1091,15 @@ latency_migrate_task_event(struct trace_migrate_task_event *migrate_task_event,
 	atom = list_entry(atoms->work_list.prev, struct work_atom, list);
 	atom->sched_in_time = atom->sched_out_time = atom->wake_up_time = timestamp;
 
-	nr_timestamps++;
+	sched->nr_timestamps++;
 
 	if (atom->sched_out_time > timestamp)
-		nr_unordered_timestamps++;
-}
+		sched->nr_unordered_timestamps++;
 
-static struct trace_sched_handler lat_ops  = {
-	.wakeup_event		= latency_wakeup_event,
-	.switch_event		= latency_switch_event,
-	.runtime_event		= latency_runtime_event,
-	.fork_event		= latency_fork_event,
-	.migrate_task_event	= latency_migrate_task_event,
-};
+	return 0;
+}
 
-static void output_lat_thread(struct work_atoms *work_list)
+static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_list)
 {
 	int i;
 	int ret;
@@ -1214,8 +1113,8 @@ static void output_lat_thread(struct work_atoms *work_list)
 	if (!strcmp(work_list->thread->comm, "swapper"))
 		return;
 
-	all_runtime += work_list->total_runtime;
-	all_count += work_list->nb_atoms;
+	sched->all_runtime += work_list->total_runtime;
+	sched->all_count   += work_list->nb_atoms;
 
 	ret = printf("  %s:%d ", work_list->thread->comm, work_list->thread->pid);
 
@@ -1241,11 +1140,6 @@ static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
 	return 0;
 }
 
-static struct sort_dimension pid_sort_dimension = {
-	.name			= "pid",
-	.cmp			= pid_cmp,
-};
-
 static int avg_cmp(struct work_atoms *l, struct work_atoms *r)
 {
 	u64 avgl, avgr;
@@ -1267,11 +1161,6 @@ static int avg_cmp(struct work_atoms *l, struct work_atoms *r)
 	return 0;
 }
 
-static struct sort_dimension avg_sort_dimension = {
-	.name			= "avg",
-	.cmp			= avg_cmp,
-};
-
 static int max_cmp(struct work_atoms *l, struct work_atoms *r)
 {
 	if (l->max_lat < r->max_lat)
@@ -1282,11 +1171,6 @@ static int max_cmp(struct work_atoms *l, struct work_atoms *r)
 	return 0;
 }
 
-static struct sort_dimension max_sort_dimension = {
-	.name			= "max",
-	.cmp			= max_cmp,
-};
-
 static int switch_cmp(struct work_atoms *l, struct work_atoms *r)
 {
 	if (l->nb_atoms < r->nb_atoms)
@@ -1297,11 +1181,6 @@ static int switch_cmp(struct work_atoms *l, struct work_atoms *r)
 	return 0;
 }
 
-static struct sort_dimension switch_sort_dimension = {
-	.name			= "switch",
-	.cmp			= switch_cmp,
-};
-
 static int runtime_cmp(struct work_atoms *l, struct work_atoms *r)
 {
 	if (l->total_runtime < r->total_runtime)
@@ -1312,28 +1191,38 @@ static int runtime_cmp(struct work_atoms *l, struct work_atoms *r)
 	return 0;
 }
 
-static struct sort_dimension runtime_sort_dimension = {
-	.name			= "runtime",
-	.cmp			= runtime_cmp,
-};
-
-static struct sort_dimension *available_sorts[] = {
-	&pid_sort_dimension,
-	&avg_sort_dimension,
-	&max_sort_dimension,
-	&switch_sort_dimension,
-	&runtime_sort_dimension,
-};
-
-#define NB_AVAILABLE_SORTS	(int)(sizeof(available_sorts) / sizeof(struct sort_dimension *))
-
-static LIST_HEAD(sort_list);
-
 static int sort_dimension__add(const char *tok, struct list_head *list)
 {
-	int i;
+	size_t i;
+	static struct sort_dimension avg_sort_dimension = {
+		.name = "avg",
+		.cmp  = avg_cmp,
+	};
+	static struct sort_dimension max_sort_dimension = {
+		.name = "max",
+		.cmp  = max_cmp,
+	};
+	static struct sort_dimension pid_sort_dimension = {
+		.name = "pid",
+		.cmp  = pid_cmp,
+	};
+	static struct sort_dimension runtime_sort_dimension = {
+		.name = "runtime",
+		.cmp  = runtime_cmp,
+	};
+	static struct sort_dimension switch_sort_dimension = {
+		.name = "switch",
+		.cmp  = switch_cmp,
+	};
+	struct sort_dimension *available_sorts[] = {
+		&pid_sort_dimension,
+		&avg_sort_dimension,
+		&max_sort_dimension,
+		&switch_sort_dimension,
+		&runtime_sort_dimension,
+	};
 
-	for (i = 0; i < NB_AVAILABLE_SORTS; i++) {
+	for (i = 0; i < ARRAY_SIZE(available_sorts); i++) {
 		if (!strcmp(available_sorts[i]->name, tok)) {
 			list_add_tail(&available_sorts[i]->list, list);
 
@@ -1344,126 +1233,97 @@ static int sort_dimension__add(const char *tok, struct list_head *list)
 	return -1;
 }
 
-static void setup_sorting(void);
-
-static void sort_lat(void)
+static void perf_sched__sort_lat(struct perf_sched *sched)
 {
 	struct rb_node *node;
 
 	for (;;) {
 		struct work_atoms *data;
-		node = rb_first(&atom_root);
+		node = rb_first(&sched->atom_root);
 		if (!node)
 			break;
 
-		rb_erase(node, &atom_root);
+		rb_erase(node, &sched->atom_root);
 		data = rb_entry(node, struct work_atoms, node);
-		__thread_latency_insert(&sorted_atom_root, data, &sort_list);
+		__thread_latency_insert(&sched->sorted_atom_root, data, &sched->sort_list);
 	}
 }
 
-static struct trace_sched_handler *trace_handler;
-
-static void
-process_sched_wakeup_event(struct perf_tool *tool __used,
-			   struct event_format *event,
-			   struct perf_sample *sample,
-			   struct machine *machine,
-			   struct thread *thread)
+static int process_sched_wakeup_event(struct perf_tool *tool,
+				      struct perf_evsel *evsel,
+				      struct perf_sample *sample,
+				      struct machine *machine)
 {
-	void *data = sample->raw_data;
-	struct trace_wakeup_event wakeup_event;
-
-	FILL_COMMON_FIELDS(wakeup_event, event, data);
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
-	FILL_ARRAY(wakeup_event, comm, event, data);
-	FILL_FIELD(wakeup_event, pid, event, data);
-	FILL_FIELD(wakeup_event, prio, event, data);
-	FILL_FIELD(wakeup_event, success, event, data);
-	FILL_FIELD(wakeup_event, cpu, event, data);
+	if (sched->tp_handler->wakeup_event)
+		return sched->tp_handler->wakeup_event(sched, evsel, sample, machine);
 
-	if (trace_handler->wakeup_event)
-		trace_handler->wakeup_event(&wakeup_event, machine, event,
-					    sample->cpu, sample->time, thread);
+	return 0;
 }
 
-/*
- * Track the current task - that way we can know whether there's any
- * weird events, such as a task being switched away that is not current.
- */
-static int max_cpu;
-
-static u32 curr_pid[MAX_CPUS] = { [0 ... MAX_CPUS-1] = -1 };
-
-static struct thread *curr_thread[MAX_CPUS];
-
-static char next_shortname1 = 'A';
-static char next_shortname2 = '0';
-
-static void
-map_switch_event(struct trace_switch_event *switch_event,
-		 struct machine *machine,
-		 struct event_format *event __used,
-		 int this_cpu,
-		 u64 timestamp,
-		 struct thread *thread __used)
+static int map_switch_event(struct perf_sched *sched, struct perf_evsel *evsel,
+			    struct perf_sample *sample, struct machine *machine)
 {
-	struct thread *sched_out __used, *sched_in;
+	const u32 prev_pid = perf_evsel__intval(evsel, sample, "prev_pid"),
+		  next_pid = perf_evsel__intval(evsel, sample, "next_pid");
+	struct thread *sched_out __maybe_unused, *sched_in;
 	int new_shortname;
-	u64 timestamp0;
+	u64 timestamp0, timestamp = sample->time;
 	s64 delta;
-	int cpu;
+	int cpu, this_cpu = sample->cpu;
 
 	BUG_ON(this_cpu >= MAX_CPUS || this_cpu < 0);
 
-	if (this_cpu > max_cpu)
-		max_cpu = this_cpu;
+	if (this_cpu > sched->max_cpu)
+		sched->max_cpu = this_cpu;
 
-	timestamp0 = cpu_last_switched[this_cpu];
-	cpu_last_switched[this_cpu] = timestamp;
+	timestamp0 = sched->cpu_last_switched[this_cpu];
+	sched->cpu_last_switched[this_cpu] = timestamp;
 	if (timestamp0)
 		delta = timestamp - timestamp0;
 	else
 		delta = 0;
 
-	if (delta < 0)
-		die("hm, delta: %" PRIu64 " < 0 ?\n", delta);
-
+	if (delta < 0) {
+		pr_err("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+		return -1;
+	}
 
-	sched_out = machine__findnew_thread(machine, switch_event->prev_pid);
-	sched_in = machine__findnew_thread(machine, switch_event->next_pid);
+	sched_out = machine__findnew_thread(machine, prev_pid);
+	sched_in = machine__findnew_thread(machine, next_pid);
 
-	curr_thread[this_cpu] = sched_in;
+	sched->curr_thread[this_cpu] = sched_in;
 
 	printf("  ");
 
 	new_shortname = 0;
 	if (!sched_in->shortname[0]) {
-		sched_in->shortname[0] = next_shortname1;
-		sched_in->shortname[1] = next_shortname2;
+		sched_in->shortname[0] = sched->next_shortname1;
+		sched_in->shortname[1] = sched->next_shortname2;
 
-		if (next_shortname1 < 'Z') {
-			next_shortname1++;
+		if (sched->next_shortname1 < 'Z') {
+			sched->next_shortname1++;
 		} else {
-			next_shortname1='A';
-			if (next_shortname2 < '9') {
-				next_shortname2++;
+			sched->next_shortname1='A';
+			if (sched->next_shortname2 < '9') {
+				sched->next_shortname2++;
 			} else {
-				next_shortname2='0';
+				sched->next_shortname2='0';
 			}
 		}
 		new_shortname = 1;
 	}
 
-	for (cpu = 0; cpu <= max_cpu; cpu++) {
+	for (cpu = 0; cpu <= sched->max_cpu; cpu++) {
 		if (cpu != this_cpu)
 			printf(" ");
 		else
 			printf("*");
 
-		if (curr_thread[cpu]) {
-			if (curr_thread[cpu]->pid)
-				printf("%2s ", curr_thread[cpu]->shortname);
+		if (sched->curr_thread[cpu]) {
+			if (sched->curr_thread[cpu]->pid)
+				printf("%2s ", sched->curr_thread[cpu]->shortname);
 			else
 				printf(".  ");
 		} else
@@ -1477,134 +1337,97 @@ map_switch_event(struct trace_switch_event *switch_event,
 	} else {
 		printf("\n");
 	}
+
+	return 0;
 }
 
-static void
-process_sched_switch_event(struct perf_tool *tool __used,
-			   struct event_format *event,
-			   struct perf_sample *sample,
-			   struct machine *machine,
-			   struct thread *thread)
+static int process_sched_switch_event(struct perf_tool *tool,
+				      struct perf_evsel *evsel,
+				      struct perf_sample *sample,
+				      struct machine *machine)
 {
-	int this_cpu = sample->cpu;
-	void *data = sample->raw_data;
-	struct trace_switch_event switch_event;
-
-	FILL_COMMON_FIELDS(switch_event, event, data);
-
-	FILL_ARRAY(switch_event, prev_comm, event, data);
-	FILL_FIELD(switch_event, prev_pid, event, data);
-	FILL_FIELD(switch_event, prev_prio, event, data);
-	FILL_FIELD(switch_event, prev_state, event, data);
-	FILL_ARRAY(switch_event, next_comm, event, data);
-	FILL_FIELD(switch_event, next_pid, event, data);
-	FILL_FIELD(switch_event, next_prio, event, data);
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
+	int this_cpu = sample->cpu, err = 0;
+	u32 prev_pid = perf_evsel__intval(evsel, sample, "prev_pid"),
+	    next_pid = perf_evsel__intval(evsel, sample, "next_pid");
 
-	if (curr_pid[this_cpu] != (u32)-1) {
+	if (sched->curr_pid[this_cpu] != (u32)-1) {
 		/*
 		 * Are we trying to switch away a PID that is
 		 * not current?
 		 */
-		if (curr_pid[this_cpu] != switch_event.prev_pid)
-			nr_context_switch_bugs++;
+		if (sched->curr_pid[this_cpu] != prev_pid)
+			sched->nr_context_switch_bugs++;
 	}
-	if (trace_handler->switch_event)
-		trace_handler->switch_event(&switch_event, machine, event,
-					    this_cpu, sample->time, thread);
 
-	curr_pid[this_cpu] = switch_event.next_pid;
+	if (sched->tp_handler->switch_event)
+		err = sched->tp_handler->switch_event(sched, evsel, sample, machine);
+
+	sched->curr_pid[this_cpu] = next_pid;
+	return err;
 }
 
-static void
-process_sched_runtime_event(struct perf_tool *tool __used,
-			    struct event_format *event,
-			    struct perf_sample *sample,
-			    struct machine *machine,
-			    struct thread *thread)
+static int process_sched_runtime_event(struct perf_tool *tool,
+				       struct perf_evsel *evsel,
+				       struct perf_sample *sample,
+				       struct machine *machine)
 {
-	void *data = sample->raw_data;
-	struct trace_runtime_event runtime_event;
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
-	FILL_ARRAY(runtime_event, comm, event, data);
-	FILL_FIELD(runtime_event, pid, event, data);
-	FILL_FIELD(runtime_event, runtime, event, data);
-	FILL_FIELD(runtime_event, vruntime, event, data);
+	if (sched->tp_handler->runtime_event)
+		return sched->tp_handler->runtime_event(sched, evsel, sample, machine);
 
-	if (trace_handler->runtime_event)
-		trace_handler->runtime_event(&runtime_event, machine, event,
-					     sample->cpu, sample->time, thread);
+	return 0;
 }
 
-static void
-process_sched_fork_event(struct perf_tool *tool __used,
-			 struct event_format *event,
-			 struct perf_sample *sample,
-			 struct machine *machine __used,
-			 struct thread *thread)
+static int process_sched_fork_event(struct perf_tool *tool,
+				    struct perf_evsel *evsel,
+				    struct perf_sample *sample,
+				    struct machine *machine __maybe_unused)
 {
-	void *data = sample->raw_data;
-	struct trace_fork_event fork_event;
-
-	FILL_COMMON_FIELDS(fork_event, event, data);
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
-	FILL_ARRAY(fork_event, parent_comm, event, data);
-	FILL_FIELD(fork_event, parent_pid, event, data);
-	FILL_ARRAY(fork_event, child_comm, event, data);
-	FILL_FIELD(fork_event, child_pid, event, data);
+	if (sched->tp_handler->fork_event)
+		return sched->tp_handler->fork_event(sched, evsel, sample);
 
-	if (trace_handler->fork_event)
-		trace_handler->fork_event(&fork_event, event,
-					  sample->cpu, sample->time, thread);
+	return 0;
 }
 
-static void
-process_sched_exit_event(struct perf_tool *tool __used,
-			 struct event_format *event,
-			 struct perf_sample *sample __used,
-			 struct machine *machine __used,
-			 struct thread *thread __used)
+static int process_sched_exit_event(struct perf_tool *tool __maybe_unused,
+				    struct perf_evsel *evsel,
+				    struct perf_sample *sample __maybe_unused,
+				    struct machine *machine __maybe_unused)
 {
-	if (verbose)
-		printf("sched_exit event %p\n", event);
+	pr_debug("sched_exit event %p\n", evsel);
+	return 0;
 }
 
-static void
-process_sched_migrate_task_event(struct perf_tool *tool __used,
-				 struct event_format *event,
-				 struct perf_sample *sample,
-				 struct machine *machine,
-				 struct thread *thread)
+static int process_sched_migrate_task_event(struct perf_tool *tool,
+					    struct perf_evsel *evsel,
+					    struct perf_sample *sample,
+					    struct machine *machine)
 {
-	void *data = sample->raw_data;
-	struct trace_migrate_task_event migrate_task_event;
-
-	FILL_COMMON_FIELDS(migrate_task_event, event, data);
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
-	FILL_ARRAY(migrate_task_event, comm, event, data);
-	FILL_FIELD(migrate_task_event, pid, event, data);
-	FILL_FIELD(migrate_task_event, prio, event, data);
-	FILL_FIELD(migrate_task_event, cpu, event, data);
+	if (sched->tp_handler->migrate_task_event)
+		return sched->tp_handler->migrate_task_event(sched, evsel, sample, machine);
 
-	if (trace_handler->migrate_task_event)
-		trace_handler->migrate_task_event(&migrate_task_event, machine,
-						  event, sample->cpu,
-						  sample->time, thread);
+	return 0;
 }
 
-typedef void (*tracepoint_handler)(struct perf_tool *tool, struct event_format *event,
-				   struct perf_sample *sample,
-				   struct machine *machine,
-				   struct thread *thread);
+typedef int (*tracepoint_handler)(struct perf_tool *tool,
+				  struct perf_evsel *evsel,
+				  struct perf_sample *sample,
+				  struct machine *machine);
 
-static int perf_sched__process_tracepoint_sample(struct perf_tool *tool,
-						 union perf_event *event __used,
+static int perf_sched__process_tracepoint_sample(struct perf_tool *tool __maybe_unused,
+						 union perf_event *event __maybe_unused,
 						 struct perf_sample *sample,
 						 struct perf_evsel *evsel,
 						 struct machine *machine)
 {
-	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
-	struct pevent *pevent = sched->session->pevent;
 	struct thread *thread = machine__findnew_thread(machine, sample->pid);
+	int err = 0;
 
 	if (thread == NULL) {
 		pr_debug("problem processing %s event, skipping it.\n",
@@ -1617,30 +1440,15 @@ static int perf_sched__process_tracepoint_sample(struct perf_tool *tool,
 
 	if (evsel->handler.func != NULL) {
 		tracepoint_handler f = evsel->handler.func;
-
-		if (evsel->handler.data == NULL)
-			evsel->handler.data = pevent_find_event(pevent,
-							  evsel->attr.config);
-
-		f(tool, evsel->handler.data, sample, machine, thread);
+		err = f(tool, evsel, sample, machine);
 	}
 
-	return 0;
+	return err;
 }
 
-static struct perf_sched perf_sched = {
-	.tool = {
-		.sample		 = perf_sched__process_tracepoint_sample,
-		.comm		 = perf_event__process_comm,
-		.lost		 = perf_event__process_lost,
-		.fork		 = perf_event__process_task,
-		.ordered_samples = true,
-	},
-};
-
-static void read_events(bool destroy, struct perf_session **psession)
+static int perf_sched__read_events(struct perf_sched *sched, bool destroy,
+				   struct perf_session **psession)
 {
-	int err = -EINVAL;
 	const struct perf_evsel_str_handler handlers[] = {
 		{ "sched:sched_switch",	      process_sched_switch_event, },
 		{ "sched:sched_stat_runtime", process_sched_runtime_event, },
@@ -1652,24 +1460,25 @@ static void read_events(bool destroy, struct perf_session **psession)
 	};
 	struct perf_session *session;
 
-	session = perf_session__new(input_name, O_RDONLY, 0, false,
-				    &perf_sched.tool);
-	if (session == NULL)
-		die("No Memory");
-
-	perf_sched.session = session;
+	session = perf_session__new(sched->input_name, O_RDONLY, 0, false, &sched->tool);
+	if (session == NULL) {
+		pr_debug("No Memory for session\n");
+		return -1;
+	}
 
-	err = perf_session__set_tracepoints_handlers(session, handlers);
-	assert(err == 0);
+	if (perf_session__set_tracepoints_handlers(session, handlers))
+		goto out_delete;
 
 	if (perf_session__has_traces(session, "record -R")) {
-		err = perf_session__process_events(session, &perf_sched.tool);
-		if (err)
-			die("Failed to process events, error %d", err);
+		int err = perf_session__process_events(session, &sched->tool);
+		if (err) {
+			pr_err("Failed to process events, error %d", err);
+			goto out_delete;
+		}
 
-		nr_events      = session->hists.stats.nr_events[0];
-		nr_lost_events = session->hists.stats.total_lost;
-		nr_lost_chunks = session->hists.stats.nr_events[PERF_RECORD_LOST];
+		sched->nr_events      = session->hists.stats.nr_events[0];
+		sched->nr_lost_events = session->hists.stats.total_lost;
+		sched->nr_lost_chunks = session->hists.stats.nr_events[PERF_RECORD_LOST];
 	}
 
 	if (destroy)
@@ -1677,208 +1486,166 @@ static void read_events(bool destroy, struct perf_session **psession)
 
 	if (psession)
 		*psession = session;
+
+	return 0;
+
+out_delete:
+	perf_session__delete(session);
+	return -1;
 }
 
-static void print_bad_events(void)
+static void print_bad_events(struct perf_sched *sched)
 {
-	if (nr_unordered_timestamps && nr_timestamps) {
+	if (sched->nr_unordered_timestamps && sched->nr_timestamps) {
 		printf("  INFO: %.3f%% unordered timestamps (%ld out of %ld)\n",
-			(double)nr_unordered_timestamps/(double)nr_timestamps*100.0,
-			nr_unordered_timestamps, nr_timestamps);
+			(double)sched->nr_unordered_timestamps/(double)sched->nr_timestamps*100.0,
+			sched->nr_unordered_timestamps, sched->nr_timestamps);
 	}
-	if (nr_lost_events && nr_events) {
+	if (sched->nr_lost_events && sched->nr_events) {
 		printf("  INFO: %.3f%% lost events (%ld out of %ld, in %ld chunks)\n",
-			(double)nr_lost_events/(double)nr_events*100.0,
-			nr_lost_events, nr_events, nr_lost_chunks);
+			(double)sched->nr_lost_events/(double)sched->nr_events * 100.0,
+			sched->nr_lost_events, sched->nr_events, sched->nr_lost_chunks);
 	}
-	if (nr_state_machine_bugs && nr_timestamps) {
+	if (sched->nr_state_machine_bugs && sched->nr_timestamps) {
 		printf("  INFO: %.3f%% state machine bugs (%ld out of %ld)",
-			(double)nr_state_machine_bugs/(double)nr_timestamps*100.0,
-			nr_state_machine_bugs, nr_timestamps);
-		if (nr_lost_events)
+			(double)sched->nr_state_machine_bugs/(double)sched->nr_timestamps*100.0,
+			sched->nr_state_machine_bugs, sched->nr_timestamps);
+		if (sched->nr_lost_events)
 			printf(" (due to lost events?)");
 		printf("\n");
 	}
-	if (nr_context_switch_bugs && nr_timestamps) {
+	if (sched->nr_context_switch_bugs && sched->nr_timestamps) {
 		printf("  INFO: %.3f%% context switch bugs (%ld out of %ld)",
-			(double)nr_context_switch_bugs/(double)nr_timestamps*100.0,
-			nr_context_switch_bugs, nr_timestamps);
-		if (nr_lost_events)
+			(double)sched->nr_context_switch_bugs/(double)sched->nr_timestamps*100.0,
+			sched->nr_context_switch_bugs, sched->nr_timestamps);
+		if (sched->nr_lost_events)
 			printf(" (due to lost events?)");
 		printf("\n");
 	}
 }
 
-static void __cmd_lat(void)
+static int perf_sched__lat(struct perf_sched *sched)
 {
 	struct rb_node *next;
 	struct perf_session *session;
 
 	setup_pager();
-	read_events(false, &session);
-	sort_lat();
+	if (perf_sched__read_events(sched, false, &session))
+		return -1;
+	perf_sched__sort_lat(sched);
 
 	printf("\n ---------------------------------------------------------------------------------------------------------------\n");
 	printf("  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at     |\n");
 	printf(" ---------------------------------------------------------------------------------------------------------------\n");
 
-	next = rb_first(&sorted_atom_root);
+	next = rb_first(&sched->sorted_atom_root);
 
 	while (next) {
 		struct work_atoms *work_list;
 
 		work_list = rb_entry(next, struct work_atoms, node);
-		output_lat_thread(work_list);
+		output_lat_thread(sched, work_list);
 		next = rb_next(next);
 	}
 
 	printf(" -----------------------------------------------------------------------------------------\n");
 	printf("  TOTAL:                |%11.3f ms |%9" PRIu64 " |\n",
-		(double)all_runtime/1e6, all_count);
+		(double)sched->all_runtime / 1e6, sched->all_count);
 
 	printf(" ---------------------------------------------------\n");
 
-	print_bad_events();
+	print_bad_events(sched);
 	printf("\n");
 
 	perf_session__delete(session);
+	return 0;
 }
 
-static struct trace_sched_handler map_ops  = {
-	.wakeup_event		= NULL,
-	.switch_event		= map_switch_event,
-	.runtime_event		= NULL,
-	.fork_event		= NULL,
-};
-
-static void __cmd_map(void)
+static int perf_sched__map(struct perf_sched *sched)
 {
-	max_cpu = sysconf(_SC_NPROCESSORS_CONF);
+	sched->max_cpu = sysconf(_SC_NPROCESSORS_CONF);
 
 	setup_pager();
-	read_events(true, NULL);
-	print_bad_events();
+	if (perf_sched__read_events(sched, true, NULL))
+		return -1;
+	print_bad_events(sched);
+	return 0;
 }
 
-static void __cmd_replay(void)
+static int perf_sched__replay(struct perf_sched *sched)
 {
 	unsigned long i;
 
-	calibrate_run_measurement_overhead();
-	calibrate_sleep_measurement_overhead();
+	calibrate_run_measurement_overhead(sched);
+	calibrate_sleep_measurement_overhead(sched);
 
-	test_calibrations();
+	test_calibrations(sched);
 
-	read_events(true, NULL);
+	if (perf_sched__read_events(sched, true, NULL))
+		return -1;
 
-	printf("nr_run_events:        %ld\n", nr_run_events);
-	printf("nr_sleep_events:      %ld\n", nr_sleep_events);
-	printf("nr_wakeup_events:     %ld\n", nr_wakeup_events);
+	printf("nr_run_events:        %ld\n", sched->nr_run_events);
+	printf("nr_sleep_events:      %ld\n", sched->nr_sleep_events);
+	printf("nr_wakeup_events:     %ld\n", sched->nr_wakeup_events);
 
-	if (targetless_wakeups)
-		printf("target-less wakeups:  %ld\n", targetless_wakeups);
-	if (multitarget_wakeups)
-		printf("multi-target wakeups: %ld\n", multitarget_wakeups);
-	if (nr_run_events_optimized)
+	if (sched->targetless_wakeups)
+		printf("target-less wakeups:  %ld\n", sched->targetless_wakeups);
+	if (sched->multitarget_wakeups)
+		printf("multi-target wakeups: %ld\n", sched->multitarget_wakeups);
+	if (sched->nr_run_events_optimized)
 		printf("run atoms optimized: %ld\n",
-			nr_run_events_optimized);
+			sched->nr_run_events_optimized);
 
-	print_task_traces();
-	add_cross_task_wakeups();
+	print_task_traces(sched);
+	add_cross_task_wakeups(sched);
 
-	create_tasks();
+	create_tasks(sched);
 	printf("------------------------------------------------------------\n");
-	for (i = 0; i < replay_repeat; i++)
-		run_one_test();
-}
-
-
-static const char * const sched_usage[] = {
-	"perf sched [<options>] {record|latency|map|replay|script}",
-	NULL
-};
-
-static const struct option sched_options[] = {
-	OPT_STRING('i', "input", &input_name, "file",
-		    "input file name"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
-	OPT_END()
-};
-
-static const char * const latency_usage[] = {
-	"perf sched latency [<options>]",
-	NULL
-};
-
-static const struct option latency_options[] = {
-	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
-		   "sort by key(s): runtime, switch, avg, max"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
-	OPT_INTEGER('C', "CPU", &profile_cpu,
-		    "CPU to profile on"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
-	OPT_END()
-};
-
-static const char * const replay_usage[] = {
-	"perf sched replay [<options>]",
-	NULL
-};
+	for (i = 0; i < sched->replay_repeat; i++)
+		run_one_test(sched);
 
-static const struct option replay_options[] = {
-	OPT_UINTEGER('r', "repeat", &replay_repeat,
-		     "repeat the workload replay N times (-1: infinite)"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
-	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
-		    "dump raw trace in ASCII"),
-	OPT_END()
-};
+	return 0;
+}
 
-static void setup_sorting(void)
+static void setup_sorting(struct perf_sched *sched, const struct option *options,
+			  const char * const usage_msg[])
 {
-	char *tmp, *tok, *str = strdup(sort_order);
+	char *tmp, *tok, *str = strdup(sched->sort_order);
 
 	for (tok = strtok_r(str, ", ", &tmp);
 			tok; tok = strtok_r(NULL, ", ", &tmp)) {
-		if (sort_dimension__add(tok, &sort_list) < 0) {
+		if (sort_dimension__add(tok, &sched->sort_list) < 0) {
 			error("Unknown --sort key: `%s'", tok);
-			usage_with_options(latency_usage, latency_options);
+			usage_with_options(usage_msg, options);
 		}
 	}
 
 	free(str);
 
-	sort_dimension__add("pid", &cmp_pid);
+	sort_dimension__add("pid", &sched->cmp_pid);
 }
 
-static const char *record_args[] = {
-	"record",
-	"-a",
-	"-R",
-	"-f",
-	"-m", "1024",
-	"-c", "1",
-	"-e", "sched:sched_switch",
-	"-e", "sched:sched_stat_wait",
-	"-e", "sched:sched_stat_sleep",
-	"-e", "sched:sched_stat_iowait",
-	"-e", "sched:sched_stat_runtime",
-	"-e", "sched:sched_process_exit",
-	"-e", "sched:sched_process_fork",
-	"-e", "sched:sched_wakeup",
-	"-e", "sched:sched_migrate_task",
-};
-
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
+	const char * const record_args[] = {
+		"record",
+		"-a",
+		"-R",
+		"-f",
+		"-m", "1024",
+		"-c", "1",
+		"-e", "sched:sched_switch",
+		"-e", "sched:sched_stat_wait",
+		"-e", "sched:sched_stat_sleep",
+		"-e", "sched:sched_stat_iowait",
+		"-e", "sched:sched_stat_runtime",
+		"-e", "sched:sched_process_exit",
+		"-e", "sched:sched_process_fork",
+		"-e", "sched:sched_wakeup",
+		"-e", "sched:sched_migrate_task",
+	};
 
 	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
@@ -1897,8 +1664,85 @@ static int __cmd_record(int argc, const char **argv)
 	return cmd_record(i, rec_argv, NULL);
 }
 
-int cmd_sched(int argc, const char **argv, const char *prefix __used)
+int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 {
+	const char default_sort_order[] = "avg, max, switch, runtime";
+	struct perf_sched sched = {
+		.tool = {
+			.sample		 = perf_sched__process_tracepoint_sample,
+			.comm		 = perf_event__process_comm,
+			.lost		 = perf_event__process_lost,
+			.fork		 = perf_event__process_task,
+			.ordered_samples = true,
+		},
+		.cmp_pid	      = LIST_HEAD_INIT(sched.cmp_pid),
+		.sort_list	      = LIST_HEAD_INIT(sched.sort_list),
+		.start_work_mutex     = PTHREAD_MUTEX_INITIALIZER,
+		.work_done_wait_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.curr_pid	      = { [0 ... MAX_CPUS - 1] = -1 },
+		.sort_order	      = default_sort_order,
+		.replay_repeat	      = 10,
+		.profile_cpu	      = -1,
+		.next_shortname1      = 'A',
+		.next_shortname2      = '0',
+	};
+	const struct option latency_options[] = {
+	OPT_STRING('s', "sort", &sched.sort_order, "key[,key2...]",
+		   "sort by key(s): runtime, switch, avg, max"),
+	OPT_INCR('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_INTEGER('C', "CPU", &sched.profile_cpu,
+		    "CPU to profile on"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+	};
+	const struct option replay_options[] = {
+	OPT_UINTEGER('r', "repeat", &sched.replay_repeat,
+		     "repeat the workload replay N times (-1: infinite)"),
+	OPT_INCR('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+	};
+	const struct option sched_options[] = {
+	OPT_STRING('i', "input", &sched.input_name, "file",
+		    "input file name"),
+	OPT_INCR('v', "verbose", &verbose,
+		    "be more verbose (show symbol address, etc)"),
+	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+		    "dump raw trace in ASCII"),
+	OPT_END()
+	};
+	const char * const latency_usage[] = {
+		"perf sched latency [<options>]",
+		NULL
+	};
+	const char * const replay_usage[] = {
+		"perf sched replay [<options>]",
+		NULL
+	};
+	const char * const sched_usage[] = {
+		"perf sched [<options>] {record|latency|map|replay|script}",
+		NULL
+	};
+	struct trace_sched_handler lat_ops  = {
+		.wakeup_event	    = latency_wakeup_event,
+		.switch_event	    = latency_switch_event,
+		.runtime_event	    = latency_runtime_event,
+		.fork_event	    = latency_fork_event,
+		.migrate_task_event = latency_migrate_task_event,
+	};
+	struct trace_sched_handler map_ops  = {
+		.switch_event	    = map_switch_event,
+	};
+	struct trace_sched_handler replay_ops  = {
+		.wakeup_event	    = replay_wakeup_event,
+		.switch_event	    = replay_switch_event,
+		.fork_event	    = replay_fork_event,
+	};
+
 	argc = parse_options(argc, argv, sched_options, sched_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	if (!argc)
@@ -1914,26 +1758,26 @@ int cmd_sched(int argc, const char **argv, const char *prefix __used)
 	if (!strncmp(argv[0], "rec", 3)) {
 		return __cmd_record(argc, argv);
 	} else if (!strncmp(argv[0], "lat", 3)) {
-		trace_handler = &lat_ops;
+		sched.tp_handler = &lat_ops;
 		if (argc > 1) {
 			argc = parse_options(argc, argv, latency_options, latency_usage, 0);
 			if (argc)
 				usage_with_options(latency_usage, latency_options);
 		}
-		setup_sorting();
-		__cmd_lat();
+		setup_sorting(&sched, latency_options, latency_usage);
+		return perf_sched__lat(&sched);
 	} else if (!strcmp(argv[0], "map")) {
-		trace_handler = &map_ops;
-		setup_sorting();
-		__cmd_map();
+		sched.tp_handler = &map_ops;
+		setup_sorting(&sched, latency_options, latency_usage);
+		return perf_sched__map(&sched);
 	} else if (!strncmp(argv[0], "rep", 3)) {
-		trace_handler = &replay_ops;
+		sched.tp_handler = &replay_ops;
 		if (argc) {
 			argc = parse_options(argc, argv, replay_options, replay_usage, 0);
 			if (argc)
 				usage_with_options(replay_usage, replay_options);
 		}
-		__cmd_replay();
+		return perf_sched__replay(&sched);
 	} else {
 		usage_with_options(sched_usage, sched_options);
 	}
author	Linus Torvalds <torvalds@linux-foundation.org>	2012-10-01 10:28:49 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2012-10-01 10:28:49 -0700
commit	7e92daaefa68e5ef1e1732e45231e73adbb724e7 (patch)
tree	8e7f8ac9d82654df4c65939c6682f95510e22977 /tools/perf/builtin-sched.c
parent	7a68294278ae714ce2632a54f0f46916dca64f56 (diff)
parent	1d787d37c8ff6612b8151c6dff15bfa7347bcbdf (diff)