summaryrefslogtreecommitdiffstats
path: root/tools/perf
AgeCommit message (Collapse)Author
2012-06-12perf tools: Fix synthesizing tracepoint names from the perf.data headersArnaldo Carvalho de Melo
We need to use the per event info snapshoted at record time to synthesize the events name, so do it just after reading the perf.data headers, when we already processed the /sys events data, otherwise we'll end up using the local /sys that only by sheer luck will have the same tracepoint ID -> real event association. Example: # uname -a Linux felicio.ghostprotocols.net 3.4.0-rc5+ #1 SMP Sat May 19 15:27:11 BRT 2012 x86_64 x86_64 x86_64 GNU/Linux # perf record -e sched:sched_switch usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data (~648 samples) ] # cat /t/events/sched/sched_switch/id 279 # perf evlist -v sched:sched_switch: sample_freq=1, type: 2, config: 279, size: 80, sample_type: 1159, read_format: 7, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 # So on the above machine the sched:sched_switch has tracepoint id 279, but on the machine were we'll analyse it it has a different id: $ cat /t/events/sched/sched_switch/id 56 $ perf evlist -i /tmp/perf.data kmem:mm_balancedirty_writeout $ cat /t/events/kmem/mm_balancedirty_writeout/id 279 With this fix: $ perf evlist -i /tmp/perf.data sched:sched_switch Reported-by: Dmitry Antipov <dmitry.antipov@linaro.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-auwks8fpuhmrdpiefs55o5oz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-06-11perf stat: Fix default output fileStephane Eranian
The following commit: commit 56f3bae70638b33477a6015fd362ccfe354fd3ee Author: Jim Cromie <jim.cromie@gmail.com> Date: Wed Sep 7 17:14:00 2011 -0600 perf stat: Add --log-fd <N> option to redirect stderr elsewhere introduced a bug in the way perf stat outputs the results by default, i.e., without the --log-fd or --output option. It would default to writing to file descriptor 0, i.e., stdin. Writing to stdin is allowed and is equivalent to writing to stdout. However, there is a major difference for any script that was already capturing the output of perf stat via redirection: perf stat >/tmp/log .... or perf stat 2>/tmp/log .... They would not capture anything anymore. They would have to do: perf stat 0>/tmp/log ... This breaks compatibility with existing scripts and does not look very natural. This patch fixes the problem by looking at output_fd only when it was modified by user (> 0). It also checks that the value if positive. Passing --log-fd 0 is ignored. I would also argue that defaulting to stderr for the results is not the right thing to do, though this patch does not address this specific issue. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jim Cromie <jim.cromie@gmail.com> Link: http://lkml.kernel.org/r/20120515111111.GA9870@quad Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-06-11perf tools: Fix endianity swapping for adds_features bitmaskDavid Ahern
Based on Jiri's latest attempt: https://lkml.org/lkml/2012/5/16/61 Basically, adds_features should be byte swapped assuming unsigned longs are either 8-bytes (u64) or 4-bytes (u32). Fixes 32-bit ppc dumping 64-bit x86 feature data: ======== captured on: Sun May 20 19:23:23 2012 hostname : nxos-vdc-dev3 os release : 3.4.0-rc7+ perf version : 3.4.rc4.137.g978da3 arch : x86_64 nrcpus online : 16 nrcpus avail : 16 cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz cpuid : GenuineIntel,6,26,5 total memory : 24680324 kB ... Verified 64-bit x86 can still dump feature data for 32-bit ppc. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/4FBBB539.5010805@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf uprobes: Remove unnecessary check before strlist__deleteSrikar Dronamraju
Since strlist__delete() itself checks, the additional check before calling strlist__delete() is redundant. No Functional change. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20120531114643.23691.38666.sendpatchset@srdronam.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf symbols: Check for valid dso before creating mapSrikar Dronamraju
dso__new() can return NULL. Hence verify dso before creating a new map. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20120531114656.23691.54223.sendpatchset@srdronam.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf evsel: Fix 32 bit values endianity swap for sample_id_all headerJiri Olsa
We swap the sample_id_all header by u64 pointers. Some members of the header happen to be 32 bit values. We need to handle them separatelly. Together with other endianity patches, this change fixies perf report discrepancies on origin and target systems as described in test 1 below, e.g. following perf report diff: ... 0.12% ps [kernel.kallsyms] [k] clear_page - 0.12% awk bash [.] alloc_word_desc + 0.12% awk bash [.] yyparse 0.11% beah-rhts-task libpython2.6.so.1.0 [.] 0x5560e 0.10% perf libc-2.12.so [.] __ctype_toupper_loc - 0.09% rhts-test-runne bash [.] maybe_make_export_env + 0.09% rhts-test-runne bash [.] 0x385a0 0.09% ps [kernel.kallsyms] [k] page_fault ... Note, running following to test perf endianity handling: test 1) - origin system: # perf record -a -- sleep 10 (any perf record will do) # perf report > report.origin # perf archive perf.data - copy the perf.data, report.origin and perf.data.tar.bz2 to a target system and run: # tar xjvf perf.data.tar.bz2 -C ~/.debug # perf report > report.target # diff -u report.origin report.target - the diff should produce no output (besides some white space stuff and possibly different date/TZ output) test 2) - origin system: # perf record -ag -fo /tmp/perf.data -- sleep 1 - mount origin system root to the target system on /mnt/origin - target system: # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \ --kallsyms /mnt/origin/proc/kallsyms - complete perf.data header is displayed Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338380624-7443-4-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf session: Handle endianity swap on sample_id_all header dataJiri Olsa
Adding endianity swapping for event header attached via sample_id_all. Currently we dont do that and it's causing wrong data to be read when running report on architecture with different endianity than the record. The perf is currently able to process 32-bit PPC samples on 32-bit and 64-bit x86. Together with other endianity patches, this change fixies perf report discrepancies on origin and target systems as described in test 1 below, e.g. following perf report diff: ... 0.12% ps [kernel.kallsyms] [k] clear_page - 0.12% awk bash [.] alloc_word_desc + 0.12% awk bash [.] yyparse 0.11% beah-rhts-task libpython2.6.so.1.0 [.] 0x5560e 0.10% perf libc-2.12.so [.] __ctype_toupper_loc - 0.09% rhts-test-runne bash [.] maybe_make_export_env + 0.09% rhts-test-runne bash [.] 0x385a0 0.09% ps [kernel.kallsyms] [k] page_fault ... Note, running following to test perf endianity handling: test 1) - origin system: # perf record -a -- sleep 10 (any perf record will do) # perf report > report.origin # perf archive perf.data - copy the perf.data, report.origin and perf.data.tar.bz2 to a target system and run: # tar xjvf perf.data.tar.bz2 -C ~/.debug # perf report > report.target # diff -u report.origin report.target - the diff should produce no output (besides some white space stuff and possibly different date/TZ output) test 2) - origin system: # perf record -ag -fo /tmp/perf.data -- sleep 1 - mount origin system root to the target system on /mnt/origin - target system: # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \ --kallsyms /mnt/origin/proc/kallsyms - complete perf.data header is displayed Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338380624-7443-3-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf symbols: Handle different endians properly during symbol loadJiri Olsa
Currently we dont care about the file object's endianness. It's possible we read buildid file object from different architecture than we are currentlly running on. So we need to care about properly reading such object's data - handle different endianness properly. Adding: needs_swap DSO field dso__swap_init function to initialize DSO's needs_swap DSO__SWAP to read the data with proper swaps Together with other endianity patches, this change fixies perf report discrepancies on origin and target systems as described in test 1 below, e.g. following perf report diff: ... 0.12% ps [kernel.kallsyms] [k] clear_page - 0.12% awk bash [.] alloc_word_desc + 0.12% awk bash [.] yyparse 0.11% beah-rhts-task libpython2.6.so.1.0 [.] 0x5560e 0.10% perf libc-2.12.so [.] __ctype_toupper_loc - 0.09% rhts-test-runne bash [.] maybe_make_export_env + 0.09% rhts-test-runne bash [.] 0x385a0 0.09% ps [kernel.kallsyms] [k] page_fault ... Note, running following to test perf endianity handling: test 1) - origin system: # perf record -a -- sleep 10 (any perf record will do) # perf report > report.origin # perf archive perf.data - copy the perf.data, report.origin and perf.data.tar.bz2 to a target system and run: # tar xjvf perf.data.tar.bz2 -C ~/.debug # perf report > report.target # diff -u report.origin report.target - the diff should produce no output (besides some white space stuff and possibly different date/TZ output) test 1) - origin system: # perf record -ag -fo /tmp/perf.data -- sleep 1 - mount origin system root to the target system on /mnt/origin - target system: # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \ --kallsyms /mnt/origin/proc/kallsyms - complete perf.data header is displayed Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338380624-7443-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf evlist: Pass third argument to ioctl explicitlyNamhyung Kim
The ioctl on perf event fd wants 3 arguments but we only passed 2. As the only user of the functions is perf record and it calls them for every event (regardless of group setting), just pass 0 for now. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338443506-25009-3-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUPNamhyung Kim
The ioctl interface of perf event fd receives 3 arguments to control event group behavior but it lacked documentation. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338443506-25009-2-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf tools: Make --version show kernel version instead of pull req tagArnaldo Carvalho de Melo
Before: $ perf --version perf version perf.urgent.for.mingo.5.g37da28 After: $ perf --version perf version 3.4.8941.g37da28.dirty Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vc9b4e6023iegz9kabr3yvyv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf tools: Check if callchain is corruptedNamhyung Kim
We faced segmentation fault on perf top -G at very high sampling rate due to a corrupted callchain. While the root cause was not revealed (I failed to figure it out), this patch tries to protect us from the segfault on such cases. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sunjin Yang <fan4326@gmail.com> Link: http://lkml.kernel.org/r/1338443007-24857-2-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31perf callchain: Make callchain cursors TLSNamhyung Kim
perf top -G has a race on callchain cursor between main thread and display thread. Since the callchain cursors are used locally make them thread-local data would solve the problem. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Reported-by: Sunjin Yang <fan4326@gmail.com> Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sunjin Yang <fan4326@gmail.com> Link: http://lkml.kernel.org/r/1338443007-24857-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf tools: Fix pager on minimal-install embedded systemsAvik Sil
Some Distributions may lack "less" package being included by default, e.g., Linaro nano rootfs. In those cases use the portable "pager" command instead of "less". Signed-off-by: Avik Sil <avik.sil@linaro.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338287725-26382-1-git-send-email-avik.sil@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf tools: Fix make tarballsArnaldo Carvalho de Melo
The patch series that introduced the top level tools/ makefile and the libtraceevent broke this feature where files needed to build in a detached tarball were not included in the MANIFEST file and thus not included in the tarball. Fix it by adding the relevant files to the MANIFEST. Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-z3mjj74927xvqwhlmu18kj80@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf script: Fix regression in callchain dso nameDavid Ahern
$ perf script -i /tmp/perf.data ... gcc 13623 544315.062858: context-switches: ffffffff815f65c9 __schedule ([kernel.kallsyms]) ffffffff81087cea __cond_resched ([kernel.kallsyms]) ffffffff815f6b92 _cond_resched ([kernel.kallsyms]) ffffffff815fb87a do_page_fault ([kernel.kallsyms]) ffffffff815f8465 page_fault ([kernel.kallsyms]) 2b7a71ea0303 _dl_lookup_symbol_x ([kernel.kallsyms]) 2b7a71ea1eb5 _dl_relocate_object ([kernel.kallsyms]) 2b7a71e99b2e dl_main ([kernel.kallsyms]) 2b7a71eab7f4 _dl_sysdep_start ([kernel.kallsyms]) All DSO's in a callchain are printed as [kernel.kallsyms]. git bisect chased it to: 547a92e0aedb88129e7fbd804697a11949de2e5a is the first bad commit commit 547a92e0aedb88129e7fbd804697a11949de2e5a Author: Akihiro Nagai <akihiro.nagai.hw@hitachi.com> Date: Mon Jan 30 13:42:57 2012 +0900 perf script: Unify the expressions indicating "unknown" The perf script command uses various expressions to indicate "unknown". It is unfriendly for user scripts to parse it. So, this patch unifies the expressions to "[unknown]". Looks like a copy-paste in that the other references use al.map but this one should be node->map. With this patch you get: $ perf script -i /tmp/perf.data ... gcc 13623 544315.062858: context-switches: ffffffff815f65c9 __schedule ([kernel.kallsyms]) ffffffff81087cea __cond_resched ([kernel.kallsyms]) ffffffff815f6b92 _cond_resched ([kernel.kallsyms]) ffffffff815fb87a do_page_fault ([kernel.kallsyms]) ffffffff815f8465 page_fault ([kernel.kallsyms]) 2b7a71ea0303 _dl_lookup_symbol_x (/lib64/ld-2.14.90.so) 2b7a71ea1eb5 _dl_relocate_object (/lib64/ld-2.14.90.so) 2b7a71e99b2e dl_main (/lib64/ld-2.14.90.so) 2b7a71eab7f4 _dl_sysdep_start (/lib64/ld-2.14.90.so) Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Akihiro Nagai <akihiro.nagai.hw@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1338353906-60706-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf stat: Initialize default events wrt exclude_{guest,host}Arnaldo Carvalho de Melo
When no event is specified the tools use perf_evlist__add_default(), that will call event_attr_init to initialize the KVM exclusion bits. When the change was made to the tools so that by default guest samples would be excluded, the changes were made just to the parsing routines and to perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far just by perf stat to add multiple events, according to the level of detail specified. Recently the tools were changed to reconstruct the event name from all the details in perf_event_attr, not just from .type and .config, but taking into account all the feature bits (.exclude_{guest,host,user,kernel,etc}, .precise_ip, etc). That is when we noticed that the default for perf stat wasn't the one for the rest of the tools, i.e. the .exclude_guest bit wasn't being set. I.e. the default, that doesn't call event_attr_init was showing the :HG modifier: $ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.942119 task-clock # 0.454 CPUs utilized 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 K/sec 126 page-faults # 0.134 M/sec 693,193 cycles:HG # 0.736 GHz [40.11%] 407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%] 365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle 465,982 instructions:HG # 0.67 insns per cycle # 0.87 stalled cycles per insn 89,760 branches:HG # 95.275 M/sec 6,178 branch-misses:HG # 6.88% of all branches 0.002077228 seconds time elapsed While if one explicitely specifies the same events, which will make the parsing code to be called and thus event_attr_init is called: $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1 Performance counter stats for 'usleep 1': 1.040349 task-clock # 0.500 CPUs utilized 2 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 K/sec 127 page-faults # 0.122 M/sec 587,966 cycles # 0.565 GHz [13.18%] 459,167 stalled-cycles-frontend # 78.09% frontend cycles idle 390,249 stalled-cycles-backend # 66.37% backend cycles idle 504,006 instructions # 0.86 insns per cycle # 0.91 stalled cycles per insn 96,455 branches # 92.714 M/sec 6,522 branch-misses # 6.76% of all branches [96.12%] 0.002078681 seconds time elapsed Fix it by introducing a perf_evlist__add_default_attrs method that will call evlist_attr_init in all the perf_event_attr entries before adding the events. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf annotate browser: Fix help window entry for navigating to hottest lineArnaldo Carvalho de Melo
Its 'H', not 'h'. The later is for getting to the help window. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-7zvwphhm815y2zczoxgstzuf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30perf report: Use the right symbol for annotationArnaldo Carvalho de Melo
In non symbolic views, i.e. --sort without "symbol", as in: perf report --sort comm We're segfaulting in the --tui because we're testing the symbol resolved and then trying to use the symbol on the histogram entry where we're coalescing all hits for a COMM, and the first hist_entry for a comm may have a NULL symbol, i.e. the RIP didn't resolve to any symbol. In this case we're segfaulting, fix it by testing against the symbol in the histogram entry. Reported-by: William Cohen <wcohen@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-8ylwubbcmu27ucc9ffrku3yv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30Merge branch 'linus' into perf/urgentIngo Molnar
Merge back Linus's latest branch so that we pick up the uprobes changes. ( I tested this branch locally and while it's one from the middle of the merge window it's a good one to base further work off. ) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-29perf ui browser: Stop using 'self'Arnaldo Carvalho de Melo
Stop using this python/OOP convention, doesn't really helps. Will do more from time to time till we get it cleaned up in all of /perf. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5dyxyb8o0gf4yndk27kafbd1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-29perf annotate browser: Read perf config file for settingsArnaldo Carvalho de Melo
The defaults are: [annotate] hide_src_code = false use_offset = true jump_arrows = true show_nr_jumps = false Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-q4egci70rjgxh7bogbbfpcyf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-29perf config: Allow '_' in config file variable namesArnaldo Carvalho de Melo
For annotate I want to be able to have variables that are the same as the ones representing feature toggles. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-7rhhf6m0a72p2wja4tgv1itg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-29perf annotate browser: Make feature toggles globalArnaldo Carvalho de Melo
So that when navigating to another function from a call site or when going to another annotation browser thru the main report/top browser the options (hide source code, jump arrows, jumpy lines, etc) remains the last ones selected. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-0h0tah1zj59p01581snjufne@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-29perf annotate browser: The idx_asm field should be used in asm only viewArnaldo Carvalho de Melo
When hide_src_view is true we can't use browser_disasm_line->idx, that takes into account also non asm lines, we must use browser_disasm_line->idx_asm instead, otherwise we may end up with an index after the number of entries, oops, fix it. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-o1szpyjh3z87yi0n6x0cr8uu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-29perf tools: Convert critical messages to ui__error()Namhyung Kim
There were places where use ui__warning (or even fprintf) to show critical messages. This patch converts them to ui__error so that the front-end code can implement appropriate behavior. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338265382-6872-3-git-send-email-namhyung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-28perf ui: Make --stdio default when TUI is not supportedNamhyung Kim
The commit dc41b9b8f02db ("perf ui: Change fallback policy of setup_browser") changed default behavior of the function but missed setting the use_browser variable to 0 accidently. So perf report ends up doing nothing in such cases. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1338216802-5675-1-git-send-email-namhyung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-25perf record: Fix branch_stack type in perf_record_optsStephane Eranian
The attr.branch_sample_type field is defined as u64 by the API. As such, we need to ensure the variable holding the value of the branch stack filters is also u64 otherwise we may lose bits in the future. Note also that the bogus definition of the field in perf_record_opts caused problems on big-endian PPC systems. Thanks to Anshuman Khandual for tracking the problem on PPC. Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120525211344.GA7729@quad Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-25perf tools: Reconstruct event with modifiers from perf_event_attrArnaldo Carvalho de Melo
The modifiers: k kernel space u user space h hypervisor G guest H host p, pp, ppp precision level (PEBS) that can be suffixed to an event were lost when tools used event_name() to reconstruct them from the perf_event_attr entries in a perf.data file. Fix it by following the defaults used for these modifiers in the current codebase, so: $ perf record -e instructions:u usleep 1 2> /dev/null $ perf evlist instructions:u $ perf record -e cycles:k usleep 1 2> /dev/null $ perf evlist cycles:k $ perf record -e cycles:kh usleep 1 2> /dev/null $ perf evlist cycles:kh $ perf record -e cache-misses:G usleep 1 2> /dev/null $ perf evlist cache-misses:G $ perf record -e cycles:ppk usleep 1 2> /dev/null $ perf evlist cycles:kpp $ Also works with 'top', 'report', etc. More work needed to cover tracepoints and software events while not dragging lots of baggage to the python binding, this is a minimal fix for v3.5. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4hl5glle0hxlklw4usva1mkt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-25perf top: Fix counter name fixup when fallbacking to cpu-clockArnaldo Carvalho de Melo
In 40491eaa "perf top: Update event name when falling back to cpu-clock" we freed counter->name but didn't reset it to NULL, then when setting it to the result of event_name(), event_name() would use the cached value, which by now was overwritten and thus we got garbage or a zero lenght string. Fix it by just freeing and setting counter->name to NULL, this way event_name() when called afterwards, will find the right counter name and cache it again. Found while trying 'cycles:pp' on a machine were :pp couldn't be honoured. Probably the best fallback here is to tell the user that that level of precision is not available on the PMU and then go removing 'p', levels of precision till we get to play 'cycles' and if even that fails, _then_ get to 'cpu-clock'. But that is the matter for another patch, this one just needs to fix the caching issue, which in the end will show 'cpu-clock' when tools ask for the event name being used, which clarifies things for the user, that will see that 'cycles:pp' or whatever not support event is not being used, some sort of fallback happened. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-w1neie2dqli89we1bzwkf4id@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-25perf tools: fix thread_map__new_by_pid_str() memory leak in error pathFranck Bui-Huu
The namelist array (including its content) was not freed if we fail to realloc a new 'threads' structure. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1337952109-31995-1-git-send-email-fbuihuu@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-24Merge branch 'perf-uprobes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull user-space probe instrumentation from Ingo Molnar: "The uprobes code originates from SystemTap and has been used for years in Fedora and RHEL kernels. This version is much rewritten, reviews from PeterZ, Oleg and myself shaped the end result. This tree includes uprobes support in 'perf probe' - but SystemTap (and other tools) can take advantage of user probe points as well. Sample usage of uprobes via perf, for example to profile malloc() calls without modifying user-space binaries. First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled. If you don't know which function you want to probe you can pick one from 'perf top' or can get a list all functions that can be probed within libc (binaries can be specified as well): $ perf probe -F -x /lib/libc.so.6 To probe libc's malloc(): $ perf probe -x /lib64/libc.so.6 malloc Added new event: probe_libc:malloc (on 0x7eac0) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 Make use of it to create a call graph (as the flat profile is going to look very boring): $ perf record -e probe_libc:malloc -gR make [ perf record: Woken up 173 times to write data ] [ perf record: Captured and wrote 44.190 MB perf.data (~1930712 $ perf report | less 32.03% git libc-2.15.so [.] malloc | --- malloc 29.49% cc1 libc-2.15.so [.] malloc | --- malloc | |--0.95%-- 0x208eb1000000000 | |--0.63%-- htab_traverse_noresize 11.04% as libc-2.15.so [.] malloc | --- malloc | 7.15% ld libc-2.15.so [.] malloc | --- malloc | 5.07% sh libc-2.15.so [.] malloc | --- malloc | 4.99% python-config libc-2.15.so [.] malloc | --- malloc | 4.54% make libc-2.15.so [.] malloc | --- malloc | |--7.34%-- glob | | | |--93.18%-- 0x41588f | | | --6.82%-- glob | 0x41588f ... Or: $ perf report -g flat | less # Overhead Command Shared Object Symbol # ........ ............. ............. .......... # 32.03% git libc-2.15.so [.] malloc 27.19% malloc 29.49% cc1 libc-2.15.so [.] malloc 24.77% malloc 11.04% as libc-2.15.so [.] malloc 11.02% malloc 7.15% ld libc-2.15.so [.] malloc 6.57% malloc ... The core uprobes design is fairly straightforward: uprobes probe points register themselves at (inode:offset) addresses of libraries/binaries, after which all existing (or new) vmas that map that address will have a software breakpoint injected at that address. vmas are COW-ed to preserve original content. The probe points are kept in an rbtree. If user-space executes the probed inode:offset instruction address then an event is generated which can be recovered from the regular perf event channels and mmap-ed ring-buffer. Multiple probes at the same address are supported, they create a dynamic callback list of event consumers. The basic model is further complicated by the XOL speedup: the original instruction that is probed is copied (in an architecture specific fashion) and executed out of line when the probe triggers. The XOL area is a single vma per process, with a fixed number of entries (which limits probe execution parallelism). The API: uprobes are installed/removed via /sys/kernel/debug/tracing/uprobe_events, the API is integrated to align with the kprobes interface as much as possible, but is separate to it. Injecting a probe point is privileged operation, which can be relaxed by setting perf_paranoid to -1. You can use multiple probes as well and mix them with kprobes and regular PMU events or tracepoints, when instrumenting a task." Fix up trivial conflicts in mm/memory.c due to previous cleanup of unmap_single_vma(). * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf probe: Detect probe target when m/x options are absent perf probe: Provide perf interface for uprobes tracing: Fix kconfig warning due to a typo tracing: Provide trace events interface for uprobes tracing: Extract out common code for kprobes/uprobes trace events tracing: Modify is_delete, is_return from int to bool uprobes/core: Decrement uprobe count before the pages are unmapped uprobes/core: Make background page replacement logic account for rss_stat counters uprobes/core: Optimize probe hits with the help of a counter uprobes/core: Allocate XOL slots for uprobes use uprobes/core: Handle breakpoint and singlestep exceptions uprobes/core: Rename bkpt to swbp uprobes/core: Make order of function parameters consistent across functions uprobes/core: Make macro names consistent uprobes: Update copyright notices uprobes/core: Move insn to arch specific structure uprobes/core: Remove uprobe_opcode_sz uprobes/core: Make instruction tables volatile uprobes: Move to kernel/events/ uprobes/core: Clean up, refactor and improve the code ...
2012-05-24perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specifiedArnaldo Carvalho de Melo
As: make DEBUG=1 -C tools/perf disables optimizations and _FORTIFY_SOURCE in recent distros requires optimizations to be enabled, seen on a Fedora 17 system: [acme@Fedora17 linux]$ make DEBUG=1 O=/home/acme/git/build/perf/ -C tools/perf install In file included from /usr/include/sys/types.h:26:0, from /usr/include/libelf.h:53, from /usr/include/gelf.h:53, from /usr/include/elfutils/libdw.h:53, from <stdin>:2: /usr/include/features.h:314:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4ccyiebqju4uatm31ky7725b@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-23perf evlist: Explicititely initialize input_nameArnaldo Carvalho de Melo
It was a global variable, so it was initialized, implicitely, to zero by being placed in the bss. Now it is just a local variable that is then passed to the __cmd_evlist routine, so it must be explicitely set to NULL. The problem manifested on a Fedora 17 system, using: gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) But not on several other systems, by luck. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5e8wolcjs3rgd5i6yi995gfh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf evlist: Show event attribute detailsArnaldo Carvalho de Melo
There was no easy way to see the frequency used, and with the change of default, we better provide one. [root@sandy linux]# perf evlist -F cycles: sample_freq=4000 [root@sandy linux]# perf evlist -v cycles: sample_freq=4000, size: 80, sample_type: 391, read_format: 7, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, sample_id_all: 1, exclude_guest: 1 [root@sandy linux]# Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-e1p9poez3nwrgycbmwqmhlsu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Bump default sample freq to 4 kHzArnaldo Carvalho de Melo
Quoting Ingo: "While at it I'd also suggest increasing the default sampling frequency, from 1000 Hz per CPU to at least 4Khz auto-freq or so - this should work well all across the board I think. CPUs are getting faster and command/app run times are getting shorter, 1Khz is a bit low IMO." Requested-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2jafa6mkrufyekny9ei59lpu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf buildid-list: Work better with pipe modeStephane Eranian
In order for perf buildid-list to work with pipe-mode files, it needs to process buildids and event attr structs. $ perf record -o - noploop 2 | ./perf inject -b | perf buildid-list -i - -H noploop for 2 seconds [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.084 MB - (~3678 samples) ] 0000000000000000000000000000000000000000 [kernel.kallsyms] 3a0d0629efe74a8da3eeba372cdbd74ad9b8f5d5 /usr/local/bin/noploop The reason [kernel.kallsyms] shows a 0 build-id comes from the way buildids are injected in the stream. The buildid for the kernel is provided by a BUILD_ID record. The [kernel.kallsyms] is provided by a MMAP record. There is no clean and obvious way to link the two, unfortunately. In regular mode, the kernel buildid is generated from reading the ELF image or kallsyms and perf knows to associate [kernel.kallsyms] to it. Later on, when perf processes the [kernel.kallsyms] MMAP record, it will already have a dso for it. So for now, make sure perf buildid-list shows the buildids for everything but the kernel image. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1337081295-10303-6-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Fix piped mode read codeStephane Eranian
In __perf_session__process_pipe_events(), there was a risk we would read more than what a union perf_event struct can hold. this could happen in case, perf is reading a file which contains new record types it does not know about and which are larger than anything it knows about. In general, perf is supposed to skip records it does not understand, but in pipe mode, those have to be read and ignored. The fixed size header contains the size of the record, but that size may be larger than union perf_event, yet it was used as the backing to the read in: union perf_event event; void *p; size = event->header.size; p = &event; p += sizeof(struct perf_event_header); if (size - sizeof(struct perf_event_header)) { err = readn(self->fd, p, size - sizeof(struct perf_event_header)); We fix this by allocating a buffer based on the size reported in the header. We reuse the buffer as much as we can. We realloc in case it becomes too small. In the common case, the performance impact is negligible. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1337081295-10303-3-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf inject: Fix broken perf inject -bStephane Eranian
perf inject -b was broken. It would not inject any build_id into the stream. Furthermore, it would strip samples from the stream. The reason was a missing initialization of the event attribute structure. The perf_tool.tool.attr() callback was pointing to a simple repipe. But there was no initialization of the internal data structures to keep track of events and event ids. That later caused event id lookups to fail, and sample would get removed. The patch simply adds back the call to perf_event__process_attr() to initialize the evlist structure and now build_ids are again injected. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1337081295-10303-2-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: rename HEADER_TRACE_INFO to HEADER_TRACING_DATAStephane Eranian
To match the PERF_RECORD_HEADER_TRACING_DATA record type. This is the same info as the one used for pipe mode whereas the other one is for regular file output. This will help in the later patch to add meta-data infos in pipe mode. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1337081295-10303-4-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Add union u64_swap type for swapping u64 dataJiri Olsa
The following union: union { u64 val64; u32 val32[2]; } u; is used on more than one place in perf code and will be used more in upcomming patches. Adding union u64_swap to have it defined globaly so we dont need to redefine it all the time. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337151548-2396-4-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Carry perf_event_attr bitfield throught different endiansJiri Olsa
When the perf data file is read cross architectures, the perf_event__attr_swap function takes care about endianness of all the struct fields except the bitfield flags. The bitfield flags need to be transformed as well, since the bitfield binary storage differs for both endians. ABI says: Bit-fields are allocated from right to left (least to most significant) on little-endian implementations and from left to right (most to least significant) on big-endian implementations. The above seems to be byte specific, so we need to reverse each byte of the bitfield. 'Internet' also says this might be implementation specific and we probably need proper fix and carry perf_event_attr bitfield flags in separate data file FEAT_ section. Thought this seems to work for now. Note, running following to test perf endianity handling: test 1) - origin system: # perf record -a -- sleep 10 (any perf record will do) # perf report > report.origin # perf archive perf.data - copy the perf.data, report.origin and perf.data.tar.bz2 to a target system and run: # tar xjvf perf.data.tar.bz2 -C ~/.debug # perf report > report.target # diff -u report.origin report.target - the diff should produce no output (besides some white space stuff and possibly different date/TZ output) test 2) - origin system: # perf record -ag -fo /tmp/perf.data -- sleep 1 - mount origin system root to the target system on /mnt/origin - target system: # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \ --kallsyms /mnt/origin/proc/kallsyms - complete perf.data header is displayed Signed-off-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337151548-2396-3-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf record: Fix documentation for branch stack samplingAnshuman Khandual
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/4FB60C7A.2080508@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf target: Add cpu flag to sample_type if target has cpuNamhyung Kim
Add PERF_SAMPLE_CPU flag into attr->sample_type if an user specified any of cpu target (either system-wide or cpu list). It will show correct values when cpu sort key is given for perf top and perf report. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337564527-9367-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Always try to build libtraceeventNamhyung Kim
Although perf depends on the libtraceevent, it cannot know when it needs to be rebuilt. So just try to rebuild it always in order to make sure we use the latest version. While at it, silence annoying directory change messages. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1337677434-4881-2-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Rename libparsevent to libtraceevent in MakefileNamhyung Kim
Change some variable names according to new library name. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1337677434-4881-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf script: Rename struct event to struct event_format in perl engineFrederic Weisbecker
While migrating to the libtraceevent, the perl scripting engine missed this structure rename. This fixes: util/scripting-engines/trace-event-perl.c: In function "find_cache_event": util/scripting-engines/trace-event-perl.c:244: error: assignment from incompatible pointer type util/scripting-engines/trace-event-perl.c:248: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:248: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:250: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c: In function "perl_process_tracepoint": util/scripting-engines/trace-event-perl.c:286: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:286: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:307: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c: In function "perl_generate_script": util/scripting-engines/trace-event-perl.c:498: error: passing argument 1 of "trace_find_next_event" from incompatible pointer type util/scripting-engines/../trace-event.h:56: note: expected "struct event_format *" but argument is of type "struct event *" util/scripting-engines/trace-event-perl.c:498: error: assignment from incompatible pointer type util/scripting-engines/trace-event-perl.c:499: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:499: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:513: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:532: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:556: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:569: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:570: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:579: error: dereferencing pointer to incomplete type util/scripting-engines/trace-event-perl.c:580: error: dereferencing pointer to incomplete type Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/r/1337697049-30251-2-git-send-email-fweisbec@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf script: Explicitly handle known default print arg typeFrederic Weisbecker
Handle the print argument types brought by the new libparsevent in perl scripting engine. PRINT_BSTRING and PRINT_DYNAMIC_ARRAY are treated just like strings and thus don't require specific processing. But PRINT_FUNC need specific plugins which are not yet handled, lets warn if we meet this case. This fixes: util/scripting-engines/trace-event-perl.c: In function define_event_symbol: util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_BSTRING not handled in switch util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_DYNAMIC_ARRAY not handled in switch util/scripting-engines/trace-event-perl.c:188: error: enumeration value PRINT_FUNC not handled in switch Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/r/1337697049-30251-1-git-send-email-fweisbec@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Add hardcoded name term for pmu eventsJiri Olsa
Adding a new hardcoded term 'name' allowing to specify a name for the pmu event. The term is defined along with standard pmu terms. If no 'name' term is given, the event name follows following template: "raw 0x<perf_event_attr::config>" running: perf stat -e cpu/config=1,name=krava1/u ls will produce following output: ... Performance counter stats for 'ls': 0 krava1 ... running: perf stat -e cpu/config=1/u ls will produce following output: ... Performance counter stats for 'ls': 0 raw 0x1 ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337584373-2741-6-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-22perf tools: Separate 'mem:' event scanner bitsJiri Olsa
Separating 'mem:' scanner processing, so we can parse out modifier specifically and dont clash with other rules. This is just precaution for the future, so we dont need to worry about the rules clashing where we need to parse out any sub-rule of global rules. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337584373-2741-5-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>