summaryrefslogtreecommitdiffstats
path: root/kernel/trace
AgeCommit message (Collapse)Author
2009-06-26ftrace: Fix the output of profileLi Zefan
The first entry of the ftrace profile was always skipped when reading trace_stat/functionX. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A443D59.4080307@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25ring-buffer: Make it generally availablePaul Mundt
In hunting down the cause for the hwlat_detector ring buffer spew in my failed -next builds it became obvious that folks are now treating ring_buffer as something that is generic independent of tracing and thus, suitable for public driver consumption. Given that there are only a few minor areas in ring_buffer that have any reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for those and make it generally available. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Jon Masters <jcm@jonmasters.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090625053012.GB19944@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25ftrace: Remove duplicate newlineLi Zefan
Before: # echo 'sys_open:traceon:' > set_ftrace_filter # echo 'sys_close:traceoff:5' > set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### sys_open:traceon:unlimited sys_close:traceoff:count=0 After: # cat set_ftrace_filter #### all functions enabled #### sys_open:traceon:unlimited sys_close:traceoff:count=0 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4313A7.7030105@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24tracing: Fix trace_buf_size boot optionLi Zefan
We should be able to specify [KMG] when setting trace_buf_size boot option, as documented in kernel-parameters.txt Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24ftrace: Fix t_hash_start()Li Zefan
When the output of set_ftrace_filter is larger than PAGE_SIZE, t_hash_start() will be called the 2nd time, and then we start from the head of a hlist, which is wrong and causes some entries to be outputed twice. The worse is, if the hlist is large enough, reading set_ftrace_filter won't stop but in a dead loop. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41876E.2060407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24ftrace: Don't manipulate @pos in t_start()Li Zefan
It's rather confusing that in t_start(), in some cases @pos is incremented, and in some cases it's decremented and then incremented. This patch rewrites t_start() in a much more general way. Thus we fix a bug that if ftrace_filtered == 1, functions have tracer hooks won't be printed, because the branch is always unreachable: static void *t_start(...) { ... if (!p) return t_hash_start(m, pos); return p; } Before: # echo 'sys_open' > /mnt/tracing/set_ftrace_filter # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter sys_open After: # echo 'sys_open' > /mnt/tracing/set_ftrace_filter # echo 'sys_write:traceon:4' >> /mnt/tracing/set_ftrace_filter sys_open sys_write:traceon:count=4 Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41874B.4090507@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24ftrace: Don't increment @pos in g_start()Li Zefan
It's wrong to increment @pos in g_start(). It causes some entries lost when reading set_graph_function, if the output of the file is larger than PAGE_SIZE. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418738.7090401@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24tracing: Reset iterator in t_start()Li Zefan
The iterator is m->private, but it's not reset to trace_types in t_start(). If the output is larger than PAGE_SIZE and t_start() is called the 2nd time, things will go wrong. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418728.5020506@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24trace_stat: Don't increment @pos in seq start()Li Zefan
It's wrong to increment @pos in stat_seq_start(). It causes some stat entries lost when reading stat file, if the output of the file is larger than PAGE_SIZE. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418716.90209@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24tracing_bprintk: Don't increment @pos in t_start()Li Zefan
It's wrong to increment @pos in t_start(), otherwise we'll lose some entries when reading printk_formats, if the output is larger than PAGE_SIZE. Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4186FA.1020106@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24tracing/events: Don't increment @pos in s_start()Li Zefan
While testing syscall tracepoints posted by Jason, I found 3 entries were missing when reading available_events. The output size of available_events is < 4 pages, which means we lost 1 entry per page. The cause is, it's wrong to increment @pos in s_start(). Actually there's another bug here -- reading avaiable_events/set_events can race with module unload: # cat available_events | s_start() | s_stop() | | # rmmod foo.ko s_start() | call = list_entry(m->private) | @call might be freed and accessing it will lead to crash. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4186DD.6090405@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-20Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20Merge branch 'tip/tracing/urgent-1' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
2009-06-20Merge branch 'tip/tracing/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
2009-06-20tracing/urgent: warn in case of ftrace_start_up inbalanceFrederic Weisbecker
Prevent from further ftrace_start_up inbalances so that we avoid future nop patching omissions with dynamic ftrace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2009-06-20tracing/urgent: fix unbalanced ftrace_start_upFrederic Weisbecker
Perfcounter reports the following stats for a wide system profiling: # # (2364 samples) # # Overhead Symbol # ........ ...... # 15.40% [k] mwait_idle_with_hints 8.29% [k] read_hpet 5.75% [k] ftrace_caller 3.60% [k] ftrace_call [...] This snapshot has been taken while neither the function tracer nor the function graph tracer was running. With dynamic ftrace, such results show a wrong ftrace behaviour because all calls to ftrace_caller or ftrace_graph_caller (the patched calls to mcount) are supposed to be patched into nop if none of those tracers are running. The problem occurs after the first run of the function tracer. Once we launch it a second time, the callsites will never be nopped back, unless you set custom filters. For example it happens during the self tests at boot time. The function tracer selftest runs, and then the dynamic tracing is tested too. After that, the callsites are left un-nopped. This is because the reset callback of the function tracer tries to unregister two ftrace callbacks in once: the common function tracer and the function tracer with stack backtrace, regardless of which one is currently in use. It then creates an unbalance on ftrace_start_up value which is expected to be zero when the last ftrace callback is unregistered. When it reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace command, triggering the patching into nop. But since it becomes unbalanced, ie becomes lower than zero, if the kernel functions are patched again (as in every further function tracer runs), they won't ever be nopped back. Note that ftrace_call and ftrace_graph_call are still patched back to ftrace_stub in the off case, but not the callers of ftrace_call and ftrace_graph_caller. It means that the tracing is well deactivated but we waste a useless call into every kernel function. This patch just unregisters the right ftrace_ops for the function tracer on its reset callback and ignores the other one which is not registered, fixing the unbalance. The problem also happens is .30 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org
2009-06-18function-graph: add stack frame testSteven Rostedt
In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18function-graph: disable when both x86_32 and optimize for size are configuredSteven Rostedt
On x86_32, when optimize for size is set, gcc may align the frame pointer and make a copy of the the return address inside the stack frame. The return address that is located in the stack frame may not be the one used to return to the calling function. This will break the function graph tracer. The function graph tracer replaces the return address with a jump to a hook function that can trace the exit of the function. If it only replaces a copy, then the hook will not be called when the function returns. Worse yet, when the parent function returns, the function graph tracer will return back to the location of the child function which will easily crash the kernel with weird results. To see the problem, when i386 is compiled with -Os we get: c106be03: 57 push %edi c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi c106be08: 83 e4 e0 and $0xffffffe0,%esp c106be0b: ff 77 fc pushl 0xfffffffc(%edi) c106be0e: 55 push %ebp c106be0f: 89 e5 mov %esp,%ebp c106be11: 57 push %edi c106be12: 56 push %esi c106be13: 53 push %ebx c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp c106be1a: e8 f5 57 fb ff call c1021614 <mcount> When it is compiled with -O2 instead we get: c10896f0: 55 push %ebp c10896f1: 89 e5 mov %esp,%ebp c10896f3: 83 ec 28 sub $0x28,%esp c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp) c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp) c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp) c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount> The compile with -Os will align the stack pointer then set up the frame pointer (%ebp), and it copies the return address back into the stack frame. The change to the return address in mcount is done to the copy and not the real place holder of the return address. Then compile with -O2 sets up the frame pointer first, this makes the change to the return address by mcount affect where the function will jump on exit. Reported-by: Jake Edge <jake@lwn.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17ring-buffer: have benchmark test print to trace bufferSteven Rostedt
Currently the output of the ring buffer benchmark/test prints to the console. This test runs for ten seconds every ten seconds and ouputs the result after every iteration. This needlessly fills up the logs. This patch makes the ring buffer benchmark/test print to the ftrace buffer using trace_printk. To view the test results, you must examine the debug/tracing/trace file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17ring-buffer: do not grab locks in nmiSteven Rostedt
If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it will need to read from the ring buffer. But the read side of the ring buffer still takes locks. This patch adds a check on the read side that if it is in an NMI, then it will disable the ring buffer and not take any locks. Reads can still happen on a disabled ring buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17ring-buffer: add locks around rb_per_cpu_emptySteven Rostedt
The checking of whether the buffer is empty or not needs to be serialized among the readers. Add the reader spin lock around it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17ring-buffer: check for less than two in size allocationSteven Rostedt
The ring buffer must have at least two pages allocated for the reader page swap to work. The page count check will miss the case of a zero size passed in. Even though a zero size ring buffer would probably fail an allocation, making the min size check for less than two instead of equal to one makes the code a bit more robust. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-17ring-buffer: remove useless compile check for buffer_page sizeSteven Rostedt
The original version of the ring buffer had a hack to map the page struct that held the pages of the buffer to also be the structure that the ring buffer would keep the pages in a link list. This overlap of the page struct was very dangerous and that hack was removed a while ago. But there was a check to make sure the buffer_page never became bigger than the page struct, and would fail the compile if it did. The check was only meaningful when we had the hack. Now that we have separate allocated descriptors for the buffer pages, we can remove this check. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16ring-buffer: remove useless warn on checkSteven Rostedt
A check if "write > BUF_PAGE_SIZE" is done right after a if (write > BUF_PAGE_SIZE) return ...; Thus the check is actually testing the compiler and not the kernel. This is useless, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16ring-buffer: use BUF_PAGE_HDR_SIZE in calculating indexSteven Rostedt
The index of the event is found by masking PAGE_MASK to it and subtracting the header size. Currently the header size is calculate by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro BUF_PAGE_HDR_SIZE to define it. If we want to change BUF_PAGE_SIZE to something less than filling the rest of the page (this is done for debugging), then we break the algorithm to find the index. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16tracing/filters: fix race between filter setting and module unloadLi Zefan
Module unload is protected by event_mutex, while setting filter is protected by filter_mutex. This leads to the race: echo 'bar == 0 || bar == 10' \ | > sample/filter | | insmod sample.ko add_pred("bar == 0") | -> n_preds == 1 | add_pred("bar == 100") | -> n_preds == 2 | | rmmod sample.ko | insmod sample.ko add_pred("&&") | -> n_preds == 1 (should be 3) | Now event->filter->preds is corrupted. An then when filter_match_preds() is called, the WARN_ON() in it will be triggered. To avoid the race, we remove filter_mutex, and replace it with event_mutex. [ Impact: prevent corruption of filters by module removing and loading ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16tracing/filters: free filter_string in destroy_preds()Li Zefan
filter->filter_string is not freed when unloading a module: # insmod trace-events-sample.ko # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter # rmmod trace-events-sample.ko [ Impact: fix memory leak when unloading module ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A375A30.9060802@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16ring-buffer: use commit counters for commit pointer accountingSteven Rostedt
The ring buffer is made up of three sets of pointers. The head page pointer, which points to the next page for the reader to get. The commit pointer and commit index, which points to the page and index of the last committed write respectively. The tail pointer and tail index, which points to the page and the index of the last reserved data respectively (non committed). The commit pointer is only moved forward by the outer most writer. If a nested writer comes in, it will not move the pointer forward. The current implementation has a flaw. It assumes that the outer most writer successfully reserved data. There's a small race window where the outer most writer could find the tail pointer, but a nested writer could come in (via interrupt) and move the tail forward, and even the commit forward. The outer writer would not realized the commit moved forward and the accounting will break. This patch changes the design to use counters in the per cpu buffers to keep track of commits. The counters are incremented at the start of the commit, and decremented at the end. If the end commit counter is 1, then it moves the commit pointers. A loop is made to check for races between checking and moving the commit pointers. Only the outer commit should move the pointers anyway. The test of knowing if a reserve is equal to the last commit update is still needed to know for time keeping. The time code is much less racey than the commit updates. This change not only solves the mentioned race, but also makes the code simpler. [ Impact: fix commit race and simplify code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16ring-buffer: remove unused variableSteven Rostedt
Fix the compiler error: kernel/trace/ring_buffer.c: In function 'rb_move_tail': kernel/trace/ring_buffer.c:1236: warning: unused variable 'event' Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-16Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...
2009-06-16ring-buffer: have benchmark test handle discarded eventsSteven Rostedt
With the addition of commit: c7b0930857e2278f2e7714db6294e94c57f623b0 ring-buffer: prevent adding write in discarded area The ring buffer may now add discarded events when a write passes the end of a buffer page. Before, a discarded event was only added when the tracer deliberately created one. The ring buffer benchmark test does not handle discarded events when it reads the buffer and fails when it encounters one. Also fix the increment for large data entries (luckily, the test did not add any yet). [ Impact: fix false failure of ring buffer self test ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.GeunSik Lim
Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/" directory name to mount debugfs filesystem for ftrace according to ./Documentation/tracers/ftrace.txt file. And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is existed in kernel source like ftrace, DRM, Wireless, Documentation, Network[sky2]files to mount debugfs filesystem. debugfs means debug filesystem for debugging easy to use by greg kroah hartman. "/sys/kernel/debug/" name is suitable as directory name of debugfs filesystem. - debugfs related reference: http://lwn.net/Articles/334546/ Fix inconsistency of directory name to mount debugfs filesystem. * From Steven Rostedt - find_debugfs() and tracing_files() in this patch. Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com> Reviewed-by : Steven Rostedt <rostedt@goodmis.org> Reviewed-by : James Smart <james.smart@emulex.com> CC: Jiri Kosina <trivial@kernel.org> CC: David Airlie <airlied@linux.ie> CC: Peter Osterlund <petero2@telia.com> CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> CC: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15Merge branch 'timers-for-linus-migration' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually
2009-06-15ring-buffer: prevent adding write in discarded areaSteven Rostedt
This a very tight race where an interrupt could come in and not have enough data to put into the end of a buffer page, and that it would fail to write and need to go to the next page. But if this happened when another writer was about to reserver their data, and that writer has smaller data to reserve, then it could succeed even though the interrupt moved the tail page. To pervent that, if we fail to store data, and by subtracting the amount we reserved we still have room for smaller data, we need to fill that space with "discarded" data. [ Impact: prevent race were buffer data may be lost ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15tracing/filters: strloc should be unsigned shortLi Zefan
I forgot to update filter code accordingly in "tracing/events: change the type of __str_loc_item to unsigned short" (commt b0aae68cc5508f3c2fbf728988c954db4c8b8a53) It can cause system crash: # echo 1 > tracing/events/irq/irq_handler_entry/enable # echo 'name == eth0' > tracing/events/irq/irq_handler_entry/filter [ Impact: fix crash while filtering on __string() field ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B905.3090500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15tracing/filters: operand can be negativeLi Zefan
This should be a bug: # cat format name: foo_bar ID: 71 format: ... field:int bar; offset:24; size:4; # echo 'bar < 0' > filter # echo 'bar < -1' > filter bash: echo: write error: Invalid argument [ Impact: fix to allow negative operand in filer expr ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B8DF.60400@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15tracing: replace a GFP_ATOMIC with GFP_KERNEL allocationLi Zefan
Atomic allocation is not needed here. [ Impact: clean up of memory alloction type ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B898.2050607@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15tracing: fix a typo in tracing_cpumask_write()Li Zefan
It's tracing_cpumask_new that should be kfree()ed. This causes tracing_cpumask to be freed due to the typo: # echo z > tracing_cpumask bash: echo: write error: Invalid argument And subsequent reads/writes to tracing_cpuamsk will access this already-freed tracing_cpumask, thus may lead to crash. [ Impact: fix leak and crash when writing invalid val to tracing_cpumask ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15cpumask: use new operators in kernel/traceRusty Russell
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <200906122115.30787.rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-15trace: annotate bitfields in struct ring_buffer_eventVegard Nossum
This gets rid of a heap of false-positive warnings from the tracer code due to the use of bitfields. [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-11Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c
2009-06-10Merge branch 'tracing-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: function-graph: always initialize task ret_stack function-graph: move initialization of new tasks up in fork function-graph: add memory barriers for accessing task's ret_stack function-graph: enable the stack after initialization of other variables function-graph: only allocate init tasks if it was not already done Manually fix trivial conflict in kernel/trace/ftrace.c
2009-06-10Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-09tracing: add protection around module events unloadSteven Rostedt
When reading the trace buffer, there is a race that when a module is unloaded it removes events that is stilled referenced in the buffers. This patch adds the protection around the unloading of the events from modules and the reading of the trace buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09tracing: add trace_seq_vprint interfaceSteven Rostedt
The code to update the print formats for events requires a vprintf format in the trace_seq. This patch adds that interface. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09tracing/events: convert block trace points to TRACE_EVENT()Li Zefan
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09ring-buffer: fix ret in rb_add_time_stampSteven Rostedt
The update of ret got mistakenly added to the if statement of rb_try_to_discard. The variable ret should be 1 on commit and zero otherwise. [ Impact: fix compiler warning and real bug ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-08ring-buffer: pass in lockdep class key for reader_lockPeter Zijlstra
On Sun, 7 Jun 2009, Ingo Molnar wrote: > Testing tracer sched_switch: <6>Starting ring buffer hammer > PASSED > Testing tracer sysprof: PASSED > Testing tracer function: PASSED > Testing tracer irqsoff: > ============================================= > PASSED > Testing tracer preemptoff: PASSED > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ] > PASSED > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760 > --------------------------------------------- > rb_consumer/431 is trying to acquire lock: > (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70 > > but task is already holding lock: > (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0 > > other info that might help us debug this: > 1 lock held by rb_consumer/431: > #0: (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0 The ring buffer is a generic structure, and can be used outside of ftrace. If ftrace traces within the use of the ring buffer, it can produce false positives with lockdep. This patch passes in a static lock key into the allocation of the ring buffer, so that different ring buffers will have their own lock class. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1244477919.13761.9042.camel@twins> [ store key in ring buffer descriptor ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-05Merge branch 'tip/tracing/ftrace-4' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
2009-06-04Merge branch 'tracing/ftrace' into tracing/coreIngo Molnar
Merge reason: this mini-topic had outstanding problems that delayed its merge, so it does not fast-forward. Signed-off-by: Ingo Molnar <mingo@elte.hu>