summaryrefslogtreecommitdiffstats
path: root/kernel/trace/Makefile
AgeCommit message (Collapse)Author
2009-06-09tracing/events: convert block trace points to TRACE_EVENT()Li Zefan
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07Merge branch 'tracing/hw-branch-tracing' into tracing/coreIngo Molnar
Merge reason: this topic is ready for upstream now. It passed Oleg's review and Andrew had no further mm/* objections/observations either. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-06ring-buffer: add benchmark and testerSteven Rostedt
This patch adds code that can benchmark the ring buffer as well as test it. This code can be compiled into the kernel (not recommended) or as a module. A separate ring buffer is used to not interfer with other users, like ftrace. It creates a producer and a consumer (option to disable creation of the consumer) and will run for 10 seconds, then sleep for 10 seconds and then repeat. While running, the producer will write 10 byte loads into the ring buffer with just putting in the current CPU number. The reader will continually try to read the buffer. The reader will alternate from reading the buffer via event by event, or by full pages. The output is a pr_info, thus it will fill up the syslogs. Starting ring buffer hammer End ring buffer hammer Time: 9000349 (usecs) Overruns: 12578640 Read: 5358440 (by events) Entries: 0 Total: 17937080 Missed: 0 Hit: 17937080 Entries per millisec: 1993 501 ns per entry Sleeping for 10 secs Starting ring buffer hammer End ring buffer hammer Time: 9936350 (usecs) Overruns: 0 Read: 28146644 (by pages) Entries: 74 Total: 28146718 Missed: 0 Hit: 28146718 Entries per millisec: 2832 353 ns per entry Sleeping for 10 secs Time: is the time the test ran Overruns: the number of events that were overwritten and not read Read: the number of events read (either by pages or events) Entries: the number of entries left in the buffer (the by pages will only read full pages) Total: Entries + Read + Overruns Missed: the number of entries that failed to write Hit: the number of entries that were written The above example shows that it takes ~353 nanosecs per entry when there is a reader, reading by pages (and no overruns) The event by event reader slowed the producer down to 501 nanosecs. [ Impact: see how changes to the ring buffer affect stability and performance ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14tracing/events: move the ftrace event tracing code to coreSteven Rostedt
This patch moves the ftrace creation into include/trace/ftrace.h and simplifies the work of developers in adding new tracepoints. Just the act of creating the trace points in include/trace and including define_trace.h will create the events in the debugfs/tracing/events directory. This patch removes the need of include/trace/trace_events.h Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14tracing: make the trace clocks available generallyIngo Molnar
Jeremy Fitzhardinge reported this build failure: LD .tmp_vmlinux1 arch/x86/kernel/built-in.o: In function `ds_take_timestamp': git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global' git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global' Which is due to !CONFIG_TRACING && CONFIG_X86_DS=y. Expose the trace clock code to CONFIG_X86_DS as well. [ Unfortunately librarizing doesnt work well - ancient architectures with no raw_local_irq_save() primitive break the build. ] Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> LKML-Reference: <49E4413F.7070700@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-14tracing/infrastructure: separate event tracer from event supportTom Zanussi
Add a new config option, CONFIG_EVENT_TRACING that gets selected when CONFIG_TRACING is selected and adds everything needed by the stuff in trace_export - basically all the event tracing support needed by e.g. bprint, minus the actual events, which are only included if CONFIG_EVENT_TRACER is selected. So CONFIG_EVENT_TRACER can be used to turn on or off the generated events (what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns on or off the base event tracing support used by both the event tracer and the other things such as bprint that can't be configured out. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: fweisbec@gmail.com LKML-Reference: <1239178441.10295.34.camel@tropicana> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: add per-event filteringTom Zanussi
This patch adds per-event filtering to the event tracing subsystem. It adds a 'filter' debugfs file to each event directory. This file can be written to to set filters; reading from it will display the current set of filters set for that event. Basically, any field listed in the 'format' file for an event can be filtered on (including strings, but not yet other array types) using either matching ('==') or non-matching ('!=') 'predicates'. A 'predicate' can be either a single expression: # echo pid != 0 > filter # cat filter pid != 0 or a compound expression of up to 8 sub-expressions combined using '&&' or '||': # echo comm == Xorg > filter # echo "&& sig != 29" > filter # cat filter comm == Xorg && sig != 29 Only events having field values matching an expression will be available in the trace output; non-matching events are discarded. Note that a compound expression is built up by echoing each sub-expression separately - it's not the most efficient way to do things, but it keeps the parser simple and assumes that compound expressions will be relatively uncommon. In any case, a subsequent patch introducing a way to set filters for entire subsystems should mitigate any need to do this for lots of events. Setting a filter without an '&&' or '||' clears the previous filter completely and sets the filter to the new expression: # cat filter comm == Xorg && sig != 29 # echo comm != Xorg # cat filter comm != Xorg To clear a filter, echo 0 to the filter file: # echo 0 > filter # cat filter none The limit of 8 predicates for a compound expression is arbitrary - for efficiency, it's implemented as an array of pointers to predicates, and 8 seemed more than enough for any filter... Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237710665.7703.48.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-20ftrace: event profile hooksPeter Zijlstra
Impact: new tracing infrastructure feature Provide infrastructure to generate software perf counter events from tracepoints. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090319194233.557364871@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13tracing/ftrace: syscall tracing infrastructure, basicsFrederic Weisbecker
Provide basic callbacks to do syscall tracing. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1236401580-5758-2-git-send-email-fweisbec@gmail.com> [ simplified it to a trace_printk() for now. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06tracing/core: drop the old trace_printk() implementation in favour of ↵Frederic Weisbecker
trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-06tracing: infrastructure for supporting binary recordLai Jiangshan
Impact: save on memory for tracing Current tracers are typically using a struct(like struct ftrace_entry, struct ctx_switch_entry, struct special_entr etc...)to record a binary event. These structs can only record a their own kind of events. A new kind of tracer need a new struct and a lot of code too handle it. So we need a generic binary record for events. This infrastructure is for this purpose. [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched tracing as reported by Steven Rostedt] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-05tracing: add format files for ftrace default entriesSteven Rostedt
Impact: allow user apps to read binary format of basic ftrace entries Currently, only defined raw events export their formats so a binary reader can parse them. There's no reason that the default ftrace entries can't export their formats. This patch adds a subsystem called "ftrace" in the events directory that includes the ftrace entries for basic ftrace recorded items. These only have three files in the events directory: type : printf available_types : printf format : format for the event entry For example: # cat /debug/tracing/events/ftrace/wakeup/format name: wakeup ID: 3 format: field:unsigned char type; offset:0; size:1; field:unsigned char flags; offset:1; size:1; field:unsigned char preempt_count; offset:2; size:1; field:int pid; offset:4; size:4; field:int tgid; offset:8; size:4; field:unsigned int prev_pid; offset:12; size:4; field:unsigned char prev_prio; offset:16; size:1; field:unsigned char prev_state; offset:17; size:1; field:unsigned int next_pid; offset:20; size:4; field:unsigned char next_prio; offset:24; size:1; field:unsigned char next_state; offset:25; size:1; field:unsigned int next_cpu; offset:28; size:4; print fmt: "%u:%u:%u ==+ %u:%u:%u [%03u]" Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-26tracing: implement trace_clock_*() APIsIngo Molnar
Impact: implement new tracing timestamp APIs Add three trace clock variants, with differing scalability/precision tradeoffs: - local: CPU-local trace clock - medium: scalable global clock with some jitter - global: globally monotonic, serialized clock Make the ring-buffer use the local trace clock internally. Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24tracing: add schedule events to event traceSteven Rostedt
This patch changes the trace/sched.h to use the DECLARE_TRACE_FMT such that they are automatically registered with the event tracer. And it also adds the tracing sched headers to kernel/trace/events.c Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-24tracing: add event trace infrastructureSteven Rostedt
This patch creates the event tracing infrastructure of ftrace. It will create the files: /debug/tracing/available_events /debug/tracing/set_event The available_events will list the trace points that have been registered with the event tracer. set_events will allow the user to enable or disable an event hook. example: # echo sched_wakeup > /debug/tracing/set_event Will enable the sched_wakeup event (if it is registered). # echo "!sched_wakeup" >> /debug/tracing/set_event Will disable the sched_wakeup event (and only that event). # echo > /debug/tracing/set_event Will disable all events (notice the '>') # cat /debug/tracing/available_events > /debug/tracing/set_event Will enable all registered event hooks. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-09tracing/blktrace: move the tracing file to kernel/traceFrederic Weisbecker
Impact: cleanup Move blktrace.c to kernel/trace, also move its config entry. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14tracing: add a new workqueue tracerFrederic Weisbecker
Impact: new tracer The workqueue tracer provides some statistical informations about each cpu workqueue thread such as the number of the works inserted and executed since their creation. It can help to evaluate the amount of work each of them have to perform. For example it can help a developer to decide whether he should choose a per cpu workqueue instead of a singlethreaded one. It only traces statistical informations for now but it will probably later provide event tracing too. Such a tracer could help too, and be improved, to help rt priority sorted workqueue development. To have a snapshot of the workqueues state at any time, just do cat /debugfs/tracing/trace_stat/workqueues Ie: 1 125 125 reiserfs/1 1 0 0 scsi_tgtd/1 1 0 0 aio/1 1 0 0 ata/1 1 114 114 kblockd/1 1 0 0 kintegrityd/1 1 2147 2147 events/1 0 0 0 kpsmoused 0 105 105 reiserfs/0 0 0 0 scsi_tgtd/0 0 0 0 aio/0 0 0 0 ata_aux 0 0 0 ata/0 0 0 0 cqueue 0 0 0 kacpi_notify 0 0 0 kacpid 0 149 149 kblockd/0 0 0 0 kintegrityd/0 0 1000 1000 khelper 0 2270 2270 events/0 Changes in V2: _ Drop the static array based on NR_CPU and dynamically allocate the stat array with num_possible_cpus() and other cpu mask facilities.... _ Trace workqueue insertion at a bit lower level (insert_work instead of queue_work) to handle even the workqueue barriers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-06Merge branches 'tracing/kmemtrace2' and 'tracing/ftrace' into tracing/urgentIngo Molnar
2008-12-30tracing/kmemtrace: normalize the raw tracer event to the unified tracing APIFrederic Weisbecker
Impact: new tracer plugin This patch adapts kmemtrace raw events tracing to the unified tracing API. To enable and use this tracer, just do the following: echo kmemtrace > /debugfs/tracing/current_tracer cat /debugfs/tracing/trace You will have the following output: # tracer: kmemtrace # # # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER # FREE | | | | | | | | # | type_id 1 call_site 18446744071565527833 ptr 18446612134395152256 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1 type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1 type_id 1 call_site 18446744071565585534 ptr 18446612134405955584 That was to stay backward compatible with the format output produced in inux/tracepoint.h. This is the default ouput, but note that I tried something else. If you change an option: echo kmem_minimalistic > /debugfs/trace_options and then cat /debugfs/trace, you will have the following output: # tracer: kmemtrace # # # ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER # FREE | | | | | | | | # | - C 0xffff88007c088780 file_free_rcu + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname + K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc + K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode - C 0xffff88007cad6000 putname + K 4096 4096 000000d0 0xffff88007cad6000 -1 getname - C 0xffff88007cad6000 putname + K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode + K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info + K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc + K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp + K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp Yeah I shall confess kmem_minimalistic should be: kmem_alternative. Whatever, I find it more readable but this a personal opinion of course. We can drop it if you want. On the ALLOC/FREE column, + means an allocation and - a free. On the type column, you have K = kmalloc, C = cache, P = page I would like the flags to be GFP_* strings but that would not be easy to not break the column with strings.... About the node...it seems to always be -1. I don't know why but that shouldn't be difficult to find. I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would be more easy to find the tracer headers if they are all in their common directory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29tracing/ftrace: provide the base infrastructure for histogram tracingFrederic Weisbecker
Impact: extend the tracing API The goal of this patch is to normalize and make more easy the implementation of statistical (histogram) tracing. It implements a trace_stat file into the /debugfs/tracing directory where one can print a one-shot output of statistics/histogram entries. A tracer has to provide two basic iterator callbacks: stat_start() => the first entry stat_next(prev, idx) => the next one. Note that it is adapted for arrays or hash tables or lists.... since it provides a pointer to the previous entry and the current index of the iterator. These two callbacks are called to get a snapshot of the statistics at each opening of the trace_stat file because. The values are so updated between two "cat trace_stat". And the tracer is free to lock its datas during the iteration to keep consistent values. Since it is almost always interesting to sort statisticals values to address the problems by priority, this infrastructure provides a "sorting" of the stat entries too if desired. A tracer has just to provide a stat_cmp callback to compare two entries and the stat tracing infrastructure will build a sorted list of the given entries. A last callback, called stat_headers, can be implemented by a tracer to output headers on its trace. If one of these callbacks is changed on runtime, it just have to signal it to the stat tracing API by calling the init_tracer_stat() helper. Changes in V2: - Fix a memory leak if the user opens multiple times the trace_stat file without closing it. Now we always free our list before rebuilding it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29ftrace: set up trace event hash infrastructureSteven Rostedt
Impact: simplify/generalize/refactor trace.c The trace.c file is becoming more difficult to maintain due to the growing number of events. There is several formats that an event may be printed. This patch sets up the infrastructure of an event hash to allow for events to register how they should be printed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12x86, bts, ftrace: adapt the hw-branch-tracer to the ds.c interfaceMarkus Metzger
Impact: restructure code, cleanup Remove BTS bits from the hw-branch-tracer (renamed from bts-tracer) and use the ds interface. Signed-off-by: Markus Metzger <markut.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27Merge branches 'tracing/blktrace', 'tracing/ftrace', ↵Ingo Molnar
'tracing/function-graph-tracer' and 'tracing/power-tracer' into tracing/core
2008-11-26tracing: add "power-tracer": C/P state tracer to help power optimizationArjan van de Ven
Impact: new "power-tracer" ftrace plugin This patch adds a C/P-state ftrace plugin that will generate detailed statistics about the C/P-states that are being used, so that we can look at detailed decisions that the C/P-state code is making, rather than the too high level "average" that we have today. An example way of using this is: mount -t debugfs none /sys/kernel/debug echo cstate > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_enabled sleep 1 echo 0 > /sys/kernel/debug/tracing/tracing_enabled cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26tracing/function-return-tracer: change the name into function-graph-tracerFrederic Weisbecker
Impact: cleanup This patch changes the name of the "return function tracer" into function-graph-tracer which is a more suitable name for a tracing which makes one able to retrieve the ordered call stack during the code flow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25x86, bts, ftrace: a BTS ftrace plug-in prototypeMarkus Metzger
Impact: add new ftrace plugin A prototype for a BTS ftrace plug-in. The tracer collects branch trace in a cyclic buffer for each cpu. The tracer is not configurable and the trace for each snapshot is appended when doing cat /debug/tracing/trace. This is a proof of concept that will be extended with future patches to become a (hopefully) useful tool. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12ftrace: rename trace_unlikely.c fileSteven Rostedt
Impact: File name change of trace_unlikely.c The "unlikely" name for the tracer is quite ugly. We renamed all the parts of it to "branch" and now it is time to rename the file too. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12trace: rename unlikely profiler to branch profilerSteven Rostedt
Impact: name change of unlikely tracer and profiler Ingo Molnar suggested changing the config from UNLIKELY_PROFILE to BRANCH_PROFILING. I never did like the "unlikely" name so I went one step farther, and renamed all the unlikely configurations to a "BRANCH" variant. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12tracing: likely/unlikely branch annotation tracerSteven Rostedt
Impact: new likely/unlikely branch tracer This patch adds a way to record the instances of the likely() and unlikely() branch condition annotations. When "unlikely" is set in /debugfs/tracing/iter_ctrl the unlikely conditions will be added to any of the ftrace tracers. The change takes effect when a new tracer is passed into the current_tracer file. For example: bash-3471 [003] 357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177 bash-3471 [003] 357.014756: [correct] update_curr:sched_fair.c:489 bash-3471 [003] 357.014758: [correct] calc_delta_fair:sched_fair.c:411 bash-3471 [003] 357.014759: [correct] account_group_exec_runtime:sched_stats.h:356 bash-3471 [003] 357.014761: [correct] update_curr:sched_fair.c:489 bash-3471 [003] 357.014763: [INCORRECT] calc_delta_fair:sched_fair.c:411 bash-3471 [003] 357.014765: [correct] calc_delta_mine:sched.c:1279 Which shows the normal tracer heading, as well as whether the condition was correct "[correct]" or was mistaken "[INCORRECT]", followed by the function, file name and line number. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12tracing: profile likely and unlikely annotationsSteven Rostedt
Impact: new unlikely/likely profiler Andrew Morton recently suggested having an in-kernel way to profile likely and unlikely macros. This patch achieves that goal. When configured, every(*) likely and unlikely macro gets a counter attached to it. When the condition is hit, the hit and misses of that condition are recorded. These numbers can later be retrieved by: /debugfs/tracing/profile_likely - All likely markers /debugfs/tracing/profile_unlikely - All unlikely markers. # cat /debug/tracing/profile_unlikely | head correct incorrect % Function File Line ------- --------- - -------- ---- ---- 2167 0 0 do_arch_prctl process_64.c 832 0 0 0 do_arch_prctl process_64.c 804 2670 0 0 IS_ERR err.h 34 71230 5693 7 __switch_to process_64.c 673 76919 0 0 __switch_to process_64.c 639 43184 33743 43 __switch_to process_64.c 624 12740 64181 83 __switch_to process_64.c 594 12740 64174 83 __switch_to process_64.c 590 # cat /debug/tracing/profile_unlikely | \ awk '{ if ($3 > 25) print $0; }' |head -20 44963 35259 43 __switch_to process_64.c 624 12762 67454 84 __switch_to process_64.c 594 12762 67447 84 __switch_to process_64.c 590 1478 595 28 syscall_get_error syscall.h 51 0 2821 100 syscall_trace_leave ptrace.c 1567 0 1 100 native_smp_prepare_cpus smpboot.c 1237 86338 265881 75 calc_delta_fair sched_fair.c 408 210410 108540 34 calc_delta_mine sched.c 1267 0 54550 100 sched_info_queued sched_stats.h 222 51899 66435 56 pick_next_task_fair sched_fair.c 1422 6 10 62 yield_task_fair sched_fair.c 982 7325 2692 26 rt_policy sched.c 144 0 1270 100 pre_schedule_rt sched_rt.c 1261 1268 48073 97 pick_next_task_rt sched_rt.c 884 0 45181 100 sched_info_dequeued sched_stats.h 177 0 15 100 sched_move_task sched.c 8700 0 15 100 sched_move_task sched.c 8690 53167 33217 38 schedule sched.c 4457 0 80208 100 sched_info_switch sched_stats.h 270 30585 49631 61 context_switch sched.c 2619 # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }' 39900 36577 47 pick_next_task sched.c 4397 20824 15233 42 switch_mm mmu_context_64.h 18 0 7 100 __cancel_work_timer workqueue.c 560 617 66484 99 clocksource_adjust timekeeping.c 456 0 346340 100 audit_syscall_exit auditsc.c 1570 38 347350 99 audit_get_context auditsc.c 732 0 345244 100 audit_syscall_entry auditsc.c 1541 38 1017 96 audit_free auditsc.c 1446 0 1090 100 audit_alloc auditsc.c 862 2618 1090 29 audit_alloc auditsc.c 858 0 6 100 move_masked_irq migration.c 9 1 198 99 probe_sched_wakeup trace_sched_switch.c 58 2 2 50 probe_wakeup trace_sched_wakeup.c 227 0 2 100 probe_wakeup_sched_switch trace_sched_wakeup.c 144 4514 2090 31 __grab_cache_page filemap.c 2149 12882 228786 94 mapping_unevictable pagemap.h 50 4 11 73 __flush_cpu_slab slub.c 1466 627757 330451 34 slab_free slub.c 1731 2959 61245 95 dentry_lru_del_init dcache.c 153 946 1217 56 load_elf_binary binfmt_elf.c 904 102 82 44 disk_put_part genhd.h 206 1 1 50 dst_gc_task dst.c 82 0 19 100 tcp_mss_split_point tcp_output.c 1126 As you can see by the above, there's a bit of work to do in rethinking the use of some unlikelys and likelys. Note: the unlikely case had 71 hits that were more than 25%. Note: After submitting my first version of this patch, Andrew Morton showed me a version written by Daniel Walker, where I picked up the following ideas from: 1) Using __builtin_constant_p to avoid profiling fixed values. 2) Using __FILE__ instead of instruction pointers. 3) Using the preprocessor to stop all profiling of likely annotations from vsyscall_64.c. Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar for their feed back on this patch. (*) Not ever unlikely is recorded, those that are used by vsyscalls (a few of them) had to have profiling disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11tracing: add a tracer to catch execution time of kernel functionsFrederic Weisbecker
Impact: add new tracing plugin which can trace full (entry+exit) function calls This tracer uses the low level function return ftrace plugin to measure the execution time of the kernel functions. The first field is the caller of the function, the second is the measured function, and the last one is the execution time in nanoseconds. - v3: - HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return should enable it. - ftrace_return_stub becomes ftrace_stub. - CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER - Return traces printing can be used for other tracers on trace.c - Adapt to the new tracing API (no more ctrl_update callback) - Correct the check of "disabled" during insertion. - Minor changes... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-20ftrace: rename FTRACE to FUNCTION_TRACERSteven Rostedt
Due to confusion between the ftrace infrastructure and the gcc profiling tracer "ftrace", this patch renames the config options from FTRACE to FUNCTION_TRACER. The other two names that are offspring from FTRACE DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same. This patch was generated mostly by script, and partially by hand. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracing: unified trace bufferSteven Rostedt
This is a unified tracing buffer that implements a ring buffer that hopefully everyone will eventually be able to use. The events recorded into the buffer have the following structure: struct ring_buffer_event { u32 type:2, len:3, time_delta:27; u32 array[]; }; The minimum size of an event is 8 bytes. All events are 4 byte aligned inside the buffer. There are 4 types (all internal use for the ring buffer, only the data type is exported to the interface users). RINGBUF_TYPE_PADDING: this type is used to note extra space at the end of a buffer page. RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events is greater than the 27 bit delta can hold. We add another 32 bits, and record that in its own event (8 byte size). RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to help keep the buffer timestamps in sync. RINGBUF_TYPE_DATA: The event actually holds user data. The "len" field is only three bits. Since the data must be 4 byte aligned, this field is shifted left by 2, giving a max length of 28 bytes. If the data load is greater than 28 bytes, the first array field holds the full length of the data load and the len field is set to zero. Example, data size of 7 bytes: type = RINGBUF_TYPE_DATA len = 2 time_delta: <time-stamp> - <prev_event-time-stamp> array[0..1]: <7 bytes of data> <1 byte empty> This event is saved in 12 bytes of the buffer. An event with 82 bytes of data: type = RINGBUF_TYPE_DATA len = 0 time_delta: <time-stamp> - <prev_event-time-stamp> array[0]: 84 (Note the alignment) array[1..14]: <82 bytes of data> <2 bytes empty> The above event is saved in 92 bytes (if my math is correct). 82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length. Do not reference the above event struct directly. Use the following functions to gain access to the event table, since the ring_buffer_event structure may change in the future. ring_buffer_event_length(event): get the length of the event. This is the size of the memory used to record this event, and not the size of the data pay load. ring_buffer_time_delta(event): get the time delta of the event This returns the delta time stamp since the last event. Note: Even though this is in the header, there should be no reason to access this directly, accept for debugging. ring_buffer_event_data(event): get the data from the event This is the function to use to get the actual data from the event. Note, it is only a pointer to the data inside the buffer. This data must be copied to another location otherwise you risk it being written over in the buffer. ring_buffer_lock: A way to lock the entire buffer. ring_buffer_unlock: unlock the buffer. ring_buffer_alloc: create a new ring buffer. Can choose between overwrite or consumer/producer mode. Overwrite will overwrite old data, where as consumer producer will throw away new data if the consumer catches up with the producer. The consumer/producer is the default. ring_buffer_free: free the ring buffer. ring_buffer_resize: resize the buffer. Changes the size of each cpu buffer. Note, it is up to the caller to provide that the buffer is not being used while this is happening. This requirement may go away but do not count on it. ring_buffer_lock_reserve: locks the ring buffer and allocates an entry on the buffer to write to. ring_buffer_unlock_commit: unlocks the ring buffer and commits it to the buffer. ring_buffer_write: writes some data into the ring buffer. ring_buffer_peek: Look at a next item in the cpu buffer. ring_buffer_consume: get the next item in the cpu buffer and consume it. That is, this function increments the head pointer. ring_buffer_read_start: Start an iterator of a cpu buffer. For now, this disables the cpu buffer, until you issue a finish. This is just because we do not want the iterator to be overwritten. This restriction may change in the future. But note, this is used for static reading of a buffer which is usually done "after" a trace. Live readings would want to use the ring_buffer_consume above, which will not disable the ring buffer. ring_buffer_read_finish: Finishes the read iterator and reenables the ring buffer. ring_buffer_iter_peek: Look at the next item in the cpu iterator. ring_buffer_read: Read the iterator and increment it. ring_buffer_iter_reset: Reset the iterator to point to the beginning of the cpu buffer. ring_buffer_iter_empty: Returns true if the iterator is at the end of the cpu buffer. ring_buffer_size: returns the size in bytes of each cpu buffer. Note, the real size is this times the number of CPUs. ring_buffer_reset_cpu: Sets the cpu buffer to empty ring_buffer_reset: sets all cpu buffers to empty ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a cpu buffer of another buffer. This is handy when you want to take a snap shot of a running trace on just one cpu. Having a backup buffer, to swap with facilitates this. Ftrace max latencies use this. ring_buffer_empty: Returns true if the ring buffer is empty. ring_buffer_empty_cpu: Returns true if the cpu buffer is empty. ring_buffer_record_disable: disable all cpu buffers (read only) ring_buffer_record_disable_cpu: disable a single cpu buffer (read only) ring_buffer_record_enable: enable all cpu buffers. ring_buffer_record_enabl_cpu: enable a single cpu buffer. ring_buffer_entries: The number of entries in a ring buffer. ring_buffer_overruns: The number of entries removed due to writing wrap. ring_buffer_time_stamp: Get the time stamp used by the ring buffer ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp into nanosecs. I still need to implement the GTOD feature. But we need support from the cpu frequency infrastructure. But this can be done at a later time without affecting the ring buffer interface. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracing/ftrace: give an entry on the config for boot tracerFrédéric Weisbecker
Bring the entry to choose the boot tracer on the kernel config. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14ftrace: add nop tracerSteven Noonan
A no-op tracer which can serve two purposes: 1. A template for development of a new tracer. 2. A convenient way to see ftrace_printk() calls without an irrelevant trace making the output messy. [ mingo@elte.hu: resolved conflicts ] Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14ftrace: add stack tracerSteven Rostedt
This is another tracer using the ftrace infrastructure, that examines at each function call the size of the stack. If the stack use is greater than the previous max it is recorded. You can always see (and set) the max stack size seen. By setting it to zero will start the recording again. The backtrace is also available. For example: # cat /debug/tracing/stack_max_size 1856 # cat /debug/tracing/stack_trace [<c027764d>] stack_trace_call+0x8f/0x101 [<c021b966>] ftrace_call+0x5/0x8 [<c02553cc>] clocksource_get_next+0x12/0x48 [<c02542a5>] update_wall_time+0x538/0x6d1 [<c0245913>] do_timer+0x23/0xb0 [<c0257657>] tick_do_update_jiffies64+0xd9/0xf1 [<c02576b9>] tick_sched_timer+0x4a/0xad [<c0250fe6>] __run_hrtimer+0x3e/0x75 [<c02518ed>] hrtimer_interrupt+0xf1/0x154 [<c022c870>] smp_apic_timer_interrupt+0x71/0x84 [<c021b7e9>] apic_timer_interrupt+0x2d/0x34 [<c0238597>] finish_task_switch+0x29/0xa0 [<c05abd13>] schedule+0x765/0x7be [<c05abfca>] schedule_timeout+0x1b/0x90 [<c05ab4d4>] wait_for_common+0xab/0x101 [<c05ab5ac>] wait_for_completion+0x12/0x14 [<c033cfc3>] blk_execute_rq+0x84/0x99 [<c0402470>] scsi_execute+0xc2/0x105 [<c040250a>] scsi_execute_req+0x57/0x7f [<c043afe0>] sr_test_unit_ready+0x3e/0x97 [<c043bbd6>] sr_media_change+0x43/0x205 [<c046b59f>] media_changed+0x48/0x77 [<c046b5ff>] cdrom_media_changed+0x31/0x37 [<c043b091>] sr_block_media_changed+0x16/0x18 [<c02b9e69>] check_disk_change+0x1b/0x63 [<c046f4c3>] cdrom_open+0x7a1/0x806 [<c043b148>] sr_block_open+0x78/0x8d [<c02ba4c0>] do_open+0x90/0x257 [<c02ba869>] blkdev_open+0x2d/0x56 [<c0296a1f>] __dentry_open+0x14d/0x23c [<c0296b32>] nameidata_to_filp+0x24/0x38 [<c02a1c68>] do_filp_open+0x347/0x626 [<c02967ef>] do_sys_open+0x47/0xbc [<c02968b0>] sys_open+0x23/0x2b [<c021aadd>] sysenter_do_call+0x12/0x26 I've tested this on both x86_64 and i386. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-10Merge branch 'tracing/sysprof' into auto-ftrace-nextIngo Molnar
2008-05-24ftrace: add mmiotrace pluginPekka Paalanen
On Sat, 22 Mar 2008 13:07:47 +0100 Ingo Molnar <mingo@elte.hu> wrote: > > > i'd suggest the following: pull x86.git and sched-devel.git into a > > > single tree [the two will combine without rejects]. Then try to add a > > > kernel/tracing/trace_mmiotrace.c ftrace plugin. The trace_sysprof.c > > > plugin might be a good example. > > > > I did this and now I have mmiotrace enabled/disabled via the tracing > > framework (what do we call this, since ftrace is one of the tracers?). > > cool! could you send the patches for that? (even if they are not fully > functional yet) Patch attached in the end. Nice to see how much code disappeared. I tried to mark all the features I had to break with XXX-comments. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: add sysprof pluginIngo Molnar
very first baby version. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: fix dynamic ftrace selftestSteven Rostedt
With the adding of the configuration changes in the Makefile to prevent tracing of functions in the ftrace code, all tracing of all the ftrace code has been removed. Unfortunately, one of the selftests, relied on a function to be traced. With the new change, the function was no longer traced and the test failed. This patch separates out the test function into its own file so that we can add the "-pg" flag to the compilation of that function and the adding of the mcount call to that function. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: disable -pg for the tracer itselfIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: trace preempt off critical timingsSteven Rostedt
Add preempt off timings. A lot of kernel core code is taken from the RT patch latency trace that was written by Ingo Molnar. This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers Now instead of just tracing irqs off, preemption off can be selected to be recorded. When this is selected, it shares the same files as irqs off timings. One can either trace preemption off, irqs off, or one or the other off. By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording of preempt off only is performed. "irqsoff" will only record the time irqs are disabled, but "preemptirqsoff" will take the total time irqs or preemption are disabled. Runtime switching of these options is now supported by simpling echoing in the appropriate trace name into /debugfs/tracing/current_tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: trace irq disabled critical timingsSteven Rostedt
This patch adds latency tracing for critical timings (how long interrupts are disabled for). "irqsoff" is added to /debugfs/tracing/available_tracers Note: tracing_max_latency also holds the max latency for irqsoff (in usecs). (default to large number so one must start latency tracing) tracing_thresh threshold (in usecs) to always print out if irqs off is detected to be longer than stated here. If irq_thresh is non-zero, then max_irq_latency is ignored. Here's an example of a trace with ftrace_enabled = 0 ======= preemption latency trace v1.1.5 on 2.6.24-rc7 Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------- latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / swapper-0 1d.s3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1d.s3 100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1d.s3 100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help ======= And this is a trace with ftrace_enabled == 1 ======= preemption latency trace v1.1.5 on 2.6.24-rc7 -------------------------------------------------------------------- latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / swapper-0 1dNs3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000]) swapper-0 1dNs3 46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000]) swapper-0 1dNs3 47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000]) swapper-0 1dNs3 47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47) swapper-0 1dNs3 97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50) swapper-0 1dNs3 98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000]) swapper-0 1dNs3 99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000]) swapper-0 1dNs3 101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help ======= Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: tracer for scheduler wakeup latencySteven Rostedt
This patch adds the tracer that tracks the wakeup latency of the highest priority waking task. "wakeup" is added to /debugfs/tracing/available_tracers Also added to /debugfs/tracing tracing_max_latency holds the current max latency for the wakeup wakeup_thresh if set to other than zero, a log will be recorded for every wakeup that takes longer than the number entered in here (usecs for all counters) (deletes previous trace) Examples: (with ftrace_enabled = 0) ============ preemption latency trace v1.1.5 on 2.6.24-rc8 Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------- latency: 26 us, #2/2, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: migration/0-3 (uid:0 nice:-5 policy:1 rt_prio:99) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / quilt-8551 0d..3 0us+: wake_up_process+0x15/0x17 <ffffffff80233e80> (sched_exec+0xc9/0x100 <ffffffff80235343>) quilt-8551 0d..4 26us : sched_switch_callback+0x73/0x81 <ffffffff80338d2f> (schedule+0x483/0x6d5 <ffffffff8048b3ee>) vim:ft=help ============ (with ftrace_enabled = 1) ============ preemption latency trace v1.1.5 on 2.6.24-rc8 -------------------------------------------------------------------- latency: 36 us, #45/45, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: migration/1-5 (uid:0 nice:-5 policy:1 rt_prio:99) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / bash-10653 1d..3 0us : wake_up_process+0x15/0x17 <ffffffff80233e80> (sched_exec+0xc9/0x100 <ffffffff80235343>) bash-10653 1d..3 1us : try_to_wake_up+0x271/0x2e7 <ffffffff80233dcf> (sub_preempt_count+0xc/0x7a <ffffffff8023309e>) bash-10653 1d..2 2us : try_to_wake_up+0x296/0x2e7 <ffffffff80233df4> (update_rq_clock+0x9/0x20 <ffffffff802303f3>) bash-10653 1d..2 2us : update_rq_clock+0x1e/0x20 <ffffffff80230408> (__update_rq_clock+0xc/0x90 <ffffffff80230366>) bash-10653 1d..2 3us : __update_rq_clock+0x1b/0x90 <ffffffff80230375> (sched_clock+0x9/0x29 <ffffffff80214529>) bash-10653 1d..2 4us : try_to_wake_up+0x2a6/0x2e7 <ffffffff80233e04> (activate_task+0xc/0x3f <ffffffff8022ffca>) bash-10653 1d..2 4us : activate_task+0x2d/0x3f <ffffffff8022ffeb> (enqueue_task+0xe/0x66 <ffffffff8022ff66>) bash-10653 1d..2 5us : enqueue_task+0x5b/0x66 <ffffffff8022ffb3> (enqueue_task_rt+0x9/0x3c <ffffffff80233351>) bash-10653 1d..2 6us : try_to_wake_up+0x2ba/0x2e7 <ffffffff80233e18> (check_preempt_wakeup+0x12/0x99 <ffffffff80234f84>) [...] bash-10653 1d..5 33us : tracing_record_cmdline+0xcf/0xd4 <ffffffff80338aad> (_spin_unlock+0x9/0x33 <ffffffff8048d3ec>) bash-10653 1d..5 34us : _spin_unlock+0x19/0x33 <ffffffff8048d3fc> (sub_preempt_count+0xc/0x7a <ffffffff8023309e>) bash-10653 1d..4 35us : wakeup_sched_switch+0x65/0x2ff <ffffffff80339f66> (_spin_lock_irqsave+0xc/0xa9 <ffffffff8048d08b>) bash-10653 1d..4 35us : _spin_lock_irqsave+0x19/0xa9 <ffffffff8048d098> (add_preempt_count+0xe/0x77 <ffffffff8023311a>) bash-10653 1d..4 36us : sched_switch_callback+0x73/0x81 <ffffffff80338d2f> (schedule+0x483/0x6d5 <ffffffff8048b3ee>) vim:ft=help ============ The [...] was added here to not waste your email box space. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: add tracing of context switchesSteven Rostedt
This patch adds context switch tracing, of the format of: _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | pid:prio:state \ / ||||| \ | / swapper-0 1d..3 137us+: 0:140:R --> 2912:120 sshd-2912 1d..3 216us+: 2912:120:S --> 0:140 swapper-0 1d..3 261us+: 0:140:R --> 2912:120 bash-2920 0d..3 267us+: 2920:120:S --> 0:140 sshd-2912 1d..3 330us!: 2912:120:S --> 0:140 swapper-0 1d..3 2389us+: 0:140:R --> 2847:120 yum-upda-2847 1d..3 2411us!: 2847:120:S --> 0:140 swapper-0 0d..3 11089us+: 0:140:R --> 3139:120 gdm-bina-3139 0d..3 11113us!: 3139:120:S --> 0:140 swapper-0 1d..3 102328us+: 0:140:R --> 2847:120 yum-upda-2847 1d..3 102348us!: 2847:120:S --> 0:140 "sched_switch" is added to /debugfs/tracing/available_tracers [ Eugene Teo <eugeneteo@kernel.sg: remove unused tracing_sched_switch_enabled ] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: function tracerSteven Rostedt
This is a simple trace that uses the ftrace infrastructure. It is designed to be fast and small, and easy to use. It is useful to record things that happen over a very short period of time, and not to analyze the system in general. Updates: available_tracers "function" is added to this file. current_tracer To enable the function tracer: echo function > /debugfs/tracing/current_tracer To disable the tracer: echo disable > /debugfs/tracing/current_tracer The output of the function_trace file is as follows "echo noverbose > /debugfs/tracing/iter_ctrl" preemption latency trace v1.1.5 on 2.6.24-rc7-tst Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------- latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) ----------------- | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>) swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>) Or with verbose turned on: "echo verbose > /debugfs/tracing/iter_ctrl" preemption latency trace v1.1.5 on 2.6.24-rc7-tst -------------------------------------------------------------------- latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) ----------------- | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>) swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>) swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>) The "trace" file is not affected by the verbose mode, but is by the symonly. echo "nosymonly" > /debugfs/tracing/iter_ctrl tracer: [ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f> [ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b> [ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d> [ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb> [ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910> [ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7> [ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d> echo "symonly" > /debugfs/tracing/iter_ctrl tracer: [ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a [ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a [ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24 [ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78 [ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70 [ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77 [ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24 Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: latency tracer infrastructureSteven Rostedt
This patch adds the latency tracer infrastructure. This patch does not add anything that will select and turn it on, but will be used by later patches. If it were to be compiled, it would add the following files to the debugfs: The root tracing directory: /debugfs/tracing/ This patch also adds the following files: available_tracers list of available tracers. Currently no tracers are available. Looking into this file only shows "none" which is used to unregister all tracers. current_tracer The trace that is currently active. Empty on start up. To switch to a tracer simply echo one of the tracers that are listed in available_tracers: example: (used with later patches) echo function > /debugfs/tracing/current_tracer To disable the tracer: echo disable > /debugfs/tracing/current_tracer tracing_enabled echoing "1" into this file starts the ftrace function tracing (if sysctl kernel.ftrace_enabled=1) echoing "0" turns it off. latency_trace This file is readonly and holds the result of the trace. trace This file outputs a easier to read version of the trace. iter_ctrl Controls the way the output of traces look. So far there's two controls: echoing in "symonly" will only show the kallsyms variables without the addresses (if kallsyms was configured) echoing in "verbose" will change the output to show a lot more data, but not very easy to understand by humans. echoing in "nosymonly" turns off symonly. echoing in "noverbose" turns off verbose. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23ftrace: add basic support for gcc profiler instrumentationArnaldo Carvalho de Melo
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is set to a non-zero value the ftrace routine will be called everytime we enter a kernel function that is not marked with the "notrace" attribute. The ftrace routine will then call a registered function if a function happens to be registered. [ This code has been highly hacked by Steven Rostedt and Ingo Molnar, so don't blame Arnaldo for all of this ;-) ] Update: It is now possible to register more than one ftrace function. If only one ftrace function is registered, that will be the function that ftrace calls directly. If more than one function is registered, then ftrace will call a function that will loop through the functions to call. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>