Age | Commit message (Collapse) | Author |
|
Anton noted that for inherited counters the counter-id as provided by
PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
because each inherited counter gets its own id.
His suggestion was to always return the parent counter id, since that
is the primary counter id as exposed. However, these inherited
counters have a unique identifier so that events like
PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
counter gets modified, which is important when trying to normalize the
sample streams.
This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
which is more useful anyway, since changing periods became a lot more
common than initially thought -- rendering PERF_EVENT_PERIOD the less
useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
value, since it reports the value used to trigger the overflow,
whereas PERF_EVENT_PERIOD simply reports the requested period changed,
which might only take effect on the next cycle).
This still leaves us PERF_EVENT_THROTTLE to consider, but since that
_should_ be a rare occurrence, and linking it to a primary id is the
most useful bit to diagnose the problem, we introduce a
PERF_SAMPLE_STREAM_ID, for those few cases where the full
reconstruction is important.
[Does change the ABI a little, but I see no other way out]
Suggested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248095846.15751.8781.camel@twins>
|
|
|
|
perf record uses -g for logging call graph data but perf report
uses -c to print call graph data. Be consistent and use -g
everywhere for call graph data.
Also update the help text to reflect the current default -
fractal,0.5
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.803604373@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
[acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17
21.94% 32101 [.] _int_malloc
20.10% 29402 [.] __GI_strcmp
16.77% 24533 [.] __tsearch
12.61% 18450 [.] malloc_consolidate
6.42% 9394 [.] _int_free
6.28% 9191 [.] __tfind
4.56% 6678 [.] __GI___libc_free
4.46% 6520 [.] _IO_vfprintf_internal
2.59% 3786 [.] __malloc
1.17% 1716 [.] __GI_memcpy
[acme@doppio pahole]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1247325517-12272-5-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
When we filter by column content we may end up with a column
that has the same value for all the lines. So remove that
column and tell its unique value on the top, as a comment.
Example:
[acme@doppio pahole]$ perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15
# dso: ./build/libdwarves.so.1.0.0
# comm: pahole
# Samples: 58409
#
# Overhead Symbol
# ........ ......
#
20.93% [.] tag__recode_dwarf_type
14.94% [.] namespace__recode_dwarf_types
10.38% [.] cu__table_add_tag
6.69% [.] __die__process_tag
5.05% [.] die__process_function
4.70% [.] list__for_all_tags
3.68% [.] tag__init
3.48% [.] die__create_new_parameter
[acme@doppio pahole]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1247325517-12272-3-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Always printing the level info about if it is in the kernel,
hypervisor or userspace as that is in the hist_entry.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1247325517-12272-1-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Auto-adjust column width of perf report output to the
longest occuring string length.
Example:
[acme@doppio pahole]$ perf report --sort comm,dso,symbol | head -13
12.79% pahole /usr/lib64/libdw-0.141.so [.] __libdw_find_attr
8.90% pahole /lib64/libc-2.10.1.so [.] _int_malloc
8.68% pahole /usr/lib64/libdw-0.141.so [.] __libdw_form_val_len
8.15% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp
6.80% pahole /lib64/libc-2.10.1.so [.] __tsearch
5.54% pahole ./build/libdwarves.so.1.0.0 [.] tag__recode_dwarf_type
[acme@doppio pahole]$
[acme@doppio pahole]$ perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10
21.92% pahole /lib64/libc-2.10.1.so [.] _int_malloc
20.08% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp
16.75% pahole /lib64/libc-2.10.1.so [.] __tsearch
[acme@doppio pahole]$
Also add these extra options to control the new behaviour:
-w, --field-width
Force each column width to the provided list, for large terminal
readability.
-t, --field-separator:
Use a special separator character and don't pad with spaces, replacing
all occurances of this separator in symbol names (and other output) with
a '.' character, that thus it's the only non valid separator.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090711014728.GH3452@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
overhead rate
The current callchain displays the overhead rates as absolute:
relative to the total overhead.
This patch provides relative overhead percentage, in which each
branch of the callchain tree is a independant instrumentated object.
This provides a 'fractal' view of the call-chain profile: each
sub-graph looks like a profile in itself - relative to its parent.
You can produce such output by using the "fractal" mode
that you can abbreviate via f, fr, fra, frac, etc...
./perf report -s sym -c fractal
Example:
8.46% [k] copy_user_generic_string
|
|--52.01%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| |--97.20%-- sys_pread64
| | system_call_fastpath
| | pread64
| |
| --2.81%-- sys_read
| system_call_fastpath
| __read
|
|--39.85%-- generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| do_sync_write
| reiserfs_file_write
| vfs_write
| |
| |--97.05%-- sys_pwrite64
| | system_call_fastpath
| | __pwrite64
| |
| --2.95%-- sys_write
| system_call_fastpath
| __write_nocancel
[...]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The default callchain parameters are set to use the flat mode and never
filter any overhead threshold of backtrace.
But flat mode is boring compared to graph mode.
Also the number of callchains may be very high if none is
filtered.
Let's change this to set the graph view and a minimum overhead of 0.5%
as default parameters.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
If the user doesn't provide options to tune his callchain output
(ie: if he uses -c without arguments) then the default value passed
in the OPT_CALLBACK_DEFAULT() macro is used.
But it's parsed later by strtok() which will replace comma separators
to a zero. This may segfault as we are using a read-only string.
Use a modifiable one instead, and also fix the "100%" default
minimum threshold value by turning it into a 0 (output every callchains)
as it was intended in the origin.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
perf report segfaults while trying to handle callchains from a non
callchain data file.
Instead of a segfault, print a useful message to the user.
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Certain versions of GCC dont see the initialization that is done here:
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function
So annotate it with a NULL initialization.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This adds the use of colors to signal at a glance the important
overhead thresholds in callchains hit rates.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Among perf annotate, perf report and perf top, we can find the
common colored printing of percents according to the following
rules:
High overhead = > 5%, colored in red
Mid overhead = > 0.5%, colored in green
Low overhead = < 0.5%, default color
Factorize these multiple checks in a single function named
percent_color_fprintf() and also provide a get_percent_color()
for sites which print percentages and other things at the same
time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Callchains output may become a burden on a trace because even
rarely hit site are exposed. This can be too much information.
Let the user set a threshold as a minimum percent of hits using
the new pattern for the -c option:
-c mode,min_percent
Example:
$ perf report -s sym -c flat,4
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
5.39% [k] search_by_key
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
$ perf report -s sym -c graph,2
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| --4.19%-- sys_pread64
| system_call_fastpath
| pread64
|
--3.24%-- generic_file_buffered_write
__generic_file_aio_write_nolock
generic_file_aio_write
do_sync_write
reiserfs_file_write
vfs_write
|
--3.14%-- sys_pwrite64
system_call_fastpath
__pwrite64
5.39% [k] search_by_key
|
--2.23%-- reiserfs_update_sd_size
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
You can also omit it and it will default to 0.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Currently, the printing of callchains is done in a single
vertical level, this is the "flat" mode:
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
This patch introduces a new "graph" mode which provides a
hierarchical output of factorized paths recursively sorted:
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| |--4.19%-- sys_pread64
| | system_call_fastpath
| | pread64
| |
| --0.12%-- sys_read
| system_call_fastpath
| __read
|
|--3.24%-- generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| do_sync_write
| reiserfs_file_write
| vfs_write
| |
| |--3.14%-- sys_pwrite64
| | system_call_fastpath
| | __pwrite64
| |
| --0.10%-- sys_write
[...]
The command line has then changed.
By providing the -c option, the callchain will output in the
flat mode by default.
But you can override it:
perf report -c graph
or
perf report -c flat
You can also pass the abreviated mode:
perf report -c g
or
perf report -c gra
will both make use of the graph mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add the -m/--modules option to perf report and perf annotate,
which enables live module symbol/image loading. To be used
with -k/--vmlinux.
(Also give perf annotate a -P/--full-paths option.)
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514986.13293.48.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
infrastructure
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514916.13293.46.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
symbols
perf_counter tools: Make symbol loading consistently return number of loaded symbols.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514758.13293.42.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The copy we were using came from another copy I did for the dwarves
(pahole) package, that came from the kernel years ago.
The only function that is used by the perf tools and that isn't in the
kernel is list_del_range, that I'm leaving in the perf tools only for
now.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090701174608.GA5823@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The tools/perf/util/rbtree.c copy already drifted by three
csets:
4b324126e0c6c3a5080ca3ec0981e8766ed6f1ee
4c60117811171d867d4f27f17ea07d7419d45dae
16c047add3ceaf0ab882e3e094d1ec904d02312d
So remove the copy and use the lib/rbtree.c directly, sharing
the source code while still generating a separate object file,
since tools/perf uses a far more agressive -O6 switch.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090701152837.GG15682@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Enable -Wextra. This found a few real bugs plus a number
of signed/unsigned type mismatches/uncleanlinesses. It
also required a few annotations
All things considered it was still worth it so lets try with
this enabled for now.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix:
builtin-report.c: In function ‘hist_entry__add’:
builtin-report.c:1015: error: case label not within a switch statement
builtin-report.c:1017: error: break statement not within loop or switch
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This patch resolves the names, when possible, of each ip
present in the callchains while using the -c option with perf
report.
Example:
5.40% [k] __d_lookup
5.37%
perf_callchain
perf_counter_overflow
intel_pmu_handle_irq
perf_counter_nmi_handler
notifier_call_chain
atomic_notifier_call_chain
notify_die
do_nmi
nmi
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
sys_faccessat
sys_access
system_call_fastpath
0x7fb609846f77
0.01%
perf_callchain
perf_counter_overflow
intel_pmu_handle_irq
perf_counter_nmi_handler
notifier_call_chain
atomic_notifier_call_chain
notify_die
do_nmi
nmi
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
sys_faccessat
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add a dso for hypervisor samples. We don't get any symbol
information on the ppc64 hypervisor but this at least gives
us a high level summary of the time spent in there.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
LKML-Reference: <20090630230141.182536873@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
PERF_EVENT_MISC_* is not a bitmask, so we have to mask and compare.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
LKML-Reference: <20090630230141.088394681@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
So that we can filter by symbol name.
The 'pfunct' utility in the 'dwarves' package can be used to
create a file with the functions one wants.
Example:
[acme@doppio pahole]$ pfunct /usr/lib/debug/usr/lib64/libdw-0.141.so.debug | grep dwarf > /tmp/dwarf.symbols
[acme@doppio pahole]$ wc -l /tmp/dwarf.symbols
93 /tmp/dwarf.symbols
[acme@doppio pahole]$ head -3 /tmp/dwarf.symbols
dwfl_addrdwarf
dwfl_module_getdwarf
dwfl_getdwarf
[acme@doppio pahole]$ perf report --sort comm,dso,symbol --comms pahole --dsos /usr/lib64/libdw-0.141.so --symbols file:///tmp/dwarf.symbols
33.99% pahole /usr/lib64/libdw-0.141.so [.] dwarf_tag
29.07% pahole /usr/lib64/libdw-0.141.so [.] dwarf_decl_file
27.71% pahole /usr/lib64/libdw-0.141.so [.] dwarf_getsrclines
4.54% pahole /usr/lib64/libdw-0.141.so 0x00000000007400
3.93% pahole /usr/lib64/libdw-0.141.so [.] dwarf_decl_line
0.46% pahole /usr/lib64/libdw-0.141.so [.] dwarf_getlocation
0.18% pahole /usr/lib64/libdw-0.141.so [.] __libdwarf_next_prime
0.13% pahole /usr/lib64/libdw-0.141.so [.] dwarf_diecu
[acme@doppio pahole]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246399282-20934-4-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
So that we can filter by comm. Symbols in other comms won't be
accounted for.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246399282-20934-3-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
So that we can filter by dso. Symbols in other dsos won't be
accounted for.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246399282-20934-2-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Use the newly created callchains radix tree to gather the chains stats
from the recorded events and then print the callchains for all of them,
sorted by hits, using the "-c" parameter with perf report.
Example:
66.15% [k] atm_clip_exit
63.08%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
0xffffffff810194f3
0xffffffff8106ab7f
0xffffffff8106abe5
0xffffffff8106acde
0xffffffff8100d94b
0xffffffff8153e7ea
[...]
1.54%
0xffffffffffffff80
0xffffffff810196a8
0xffffffff810c14c8
0xffffffff8101a79c
[...]
Symbols are not yet resolved.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246026481-8314-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We plan to display the callchains depending on some user-configurable
parameters.
To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Provide the basic infrastructure to provide per task stats.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The PERF_EVENT_READ implementation made me realize we don't
actually need the sample_type int the output sample, since
we already have that in the perf_counter_attr information.
Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the
event->type overloading, and imply put counter overflow
samples in a PERF_EVENT_SAMPLE type.
This also fixes the issue that event->type was only 32-bit
and sample_type had 64 usable bits.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Create a structured file format that includes the full
perf_counter_attr and all its relevant counter IDs so that
the reporting program has full information.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Martin Schwidefsky reported "perf report" symbol resolution
problems on S390.
Since we only report MMAP, not MUNMAP, we have to deal with
overlapping maps.
We used to simply throw out the old map on the assumption whole
maps got unmapped. This obviously doesn't deal with partial
unmaps. However it appears some dynamic linkers do fancy
partial unmaps (s390), so do something more elaborate and
truncate the old maps, only removing them when they've been
fully covered.
This resolves (part of) the S390 symbol resolution problems.
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Print more symbol relocation related info under -vv.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Building perfcounter tools raises the following warnings:
builtin-record.c: In function ‘atexit_header’:
builtin-record.c:464: erreur: ignoring return value of ‘pwrite’, declared with attribute warn_unused_result
builtin-record.c: In function ‘__cmd_record’:
builtin-record.c:503: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1403: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
This patch handles these IO return values.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1245456100-5477-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long. This causes compiler warnings every time we
print a __u64 value with %Lx.
Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8. These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.
The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__". The other changes are:
* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
to kill two remaining warnings.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add a data file header so we can transfer data between record and report.
LKML-Reference: <new-submission>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Update the tools to reflect the new callchain sampling format.
LKML-Reference: <new-submission>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Make it easier to use parent filtering - default to a filtered
output. Also add the parent column so that we get collapsing but
dont display it by default.
add --no-exclude-other to override this.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Make use of the new ->data_tail mechanism to tell kernel-space
about user-space draining the data stream. Emit lost events
(and display them) if they happen.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Introduce isprint() to print out raw event dumps to ASCII, etc.
(This is an extension to upstream Git's ctype.c.)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ removed openssl.h inclusion from util.h - it leaked ctype.h ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add boundary checks for call-chain events. In case of corrupted
entries we could crash otherwise.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
features
Instead of the ambigious 'call' naming use the much more
specific 'parent' naming:
- rename --call <regex> to --parent <regex>
- rename --sort call to --sort parent
- rename [unmatched] to [other] - to signal that this is not
an error but the inverse set
Also add pagefaults to the default parent-symbol pattern too,
as it's a 'syscall overhead category' in a sense.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The Git utils came with a ctype replacement that doesn't provide
isprint(). Add a replacement.
Solves a build bug on certain distros.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Implement sorting by callchain symbols, --sort <call>.
It will create a new column which will show a match to
--call $regex or "[unmatched]".
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Yong Wang reported the following compiler warning:
builtin-report.c: In function 'process_overflow_event':
builtin-report.c:984: error: cast to pointer from integer of different size
Which happens because we try to print ->ips[] out with a limited
format, losing the high 32 bits. Print it out using %016Lx instead.
Reported-by: Yong Wang <yong.y.wang@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Take advantage of call-graph percounter sampling/recording to
display a non-trivial histogram: the true, collapsed/summarized
cost measurement, on a per system call total overhead basis:
aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
#
# (3536 samples)
#
# Overhead Symbol
# ........ ......
#
40.75% [k] sys_write
40.21% [k] sys_read
4.44% [k] do_nmi
...
This is done by accounting each (reliable) call-chain that chains back
to a given system call to that system call function.
[ So in the above example we can see that hackbench spends about 40% of
its total time somewhere in sys_write() and 40% somewhere in
sys_read(), the rest of the time is spent in user-space. The time
is not spent in sys_write() _itself_ but in one of its many child
functions. ]
Or, a recording of a (source files are already in the page-cache) kernel build:
$ perf record -g -m 512 -f -- make -j32 kernel
$ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi
4.14% [k] do_page_fault
1.20% [k] sys_write
1.10% [k] sys_open
0.63% [k] sys_exit_group
0.48% [k] smp_apic_timer_interrupt
0.37% [k] sys_read
0.37% [k] sys_execve
0.20% [k] sys_mmap
0.18% [k] sys_close
0.14% [k] sys_munmap
0.13% [k] sys_poll
0.09% [k] sys_newstat
0.07% [k] sys_clone
0.06% [k] sys_newfstat
0.05% [k] sys_access
0.05% [k] schedule
Shows the true total cost of each syscall variant that gets used
during a kernel build. This profile reveals it that pagefaults are
the costliest, followed by read()/write().
An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
per 100 seconds of kernel build-time. (this was done with HZ=1000)
The summary is done in 'perf report', i.e. in the post-processing
stage - so once we have a good call-graph recording, this type of
non-trivial high-level analysis becomes possible.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add the first steps of call-graph profiling:
- add the -c (--call-graph) option to perf record
- parse the call-graph record and printout out under -D (--dump-trace)
The call-graph data is not put into the histogram yet, but it
can be seen that it's being processed correctly:
0x3ce0 [0x38]: event: 35
.
. ... raw event: size 56 bytes
. 0000: 23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff #.....8........
. 0010: 60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00 `...`..........
. 0020: d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00 .........a.A6..
. 0030: 04 92 e6 41 36 00 00 00 .a.A6..
.
0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
... chain: u:2, k:1, nr:3
..... 0: 0xffffffff810edfd4
..... 1: 0x3641ed61a0
..... 2: 0x3641e69204
... thread: perf:2912
...... dso: [kernel]
This shows a 3-entry call-graph: with 1 kernel-space and two user-space
entries
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|