summaryrefslogtreecommitdiffstats
path: root/include/trace/events/timer.h
AgeCommit message (Collapse)Author
2013-05-05Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'full dynticks' support from Ingo Molnar: "This tree from Frederic Weisbecker adds a new, (exciting! :-) core kernel feature to the timer and scheduler subsystems: 'full dynticks', or CONFIG_NO_HZ_FULL=y. This feature extends the nohz variable-size timer tick feature from idle to busy CPUs (running at most one task) as well, potentially reducing the number of timer interrupts significantly. This feature got motivated by real-time folks and the -rt tree, but the general utility and motivation of full-dynticks runs wider than that: - HPC workloads get faster: CPUs running a single task should be able to utilize a maximum amount of CPU power. A periodic timer tick at HZ=1000 can cause a constant overhead of up to 1.0%. This feature removes that overhead - and speeds up the system by 0.5%-1.0% on typical distro configs even on modern systems. - Real-time workload latency reduction: CPUs running critical tasks should experience as little jitter as possible. The last remaining source of kernel-related jitter was the periodic timer tick. - A single task executing on a CPU is a pretty common situation, especially with an increasing number of cores/CPUs, so this feature helps desktop and mobile workloads as well. The cost of the feature is mainly related to increased timer reprogramming overhead when a CPU switches its tick period, and thus slightly longer to-idle and from-idle latency. Configuration-wise a third mode of operation is added to the existing two NOHZ kconfig modes: - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named as a config option. This is the traditional Linux periodic tick design: there's a HZ tick going on all the time, regardless of whether a CPU is idle or not. - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the periodic tick when a CPU enters idle mode. - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the tick when a CPU is idle, also slows the tick down to 1 Hz (one timer interrupt per second) when only a single task is running on a CPU. The .config behavior is compatible: existing !CONFIG_NO_HZ and CONFIG_NO_HZ=y settings get translated to the new values, without the user having to configure anything. CONFIG_NO_HZ_FULL is turned off by default. This feature is based on a lot of infrastructure work that has been steadily going upstream in the last 2-3 cycles: related RCU support and non-periodic cputime support in particular is upstream already. This tree adds the final pieces and activates the feature. The pull request is marked RFC because: - it's marked 64-bit only at the moment - the 32-bit support patch is small but did not get ready in time. - it has a number of fresh commits that came in after the merge window. The overwhelming majority of commits are from before the merge window, but still some aspects of the tree are fresh and so I marked it RFC. - it's a pretty wide-reaching feature with lots of effects - and while the components have been in testing for some time, the full combination is still not very widely used. That it's default-off should reduce its regression abilities and obviously there are no known regressions with CONFIG_NO_HZ_FULL=y enabled either. - the feature is not completely idempotent: there is no 100% equivalent replacement for a periodic scheduler/timer tick. In particular there's ongoing work to map out and reduce its effects on scheduler load-balancing and statistics. This should not impact correctness though, there are no known regressions related to this feature at this point. - it's a pretty ambitious feature that with time will likely be enabled by most Linux distros, and we'd like you to make input on its design/implementation, if you dislike some aspect we missed. Without flaming us to crisp! :-) Future plans: - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off the periodic tick altogether when there's a single busy task on a CPU. We'd first like 1 Hz to be exposed more widely before we go for the 0 Hz target though. - once we reach 0 Hz we can remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running. I am personally leaning towards biting the bullet and doing this in v3.10, like the -rt tree this effort has been going on for too long - but the final word is up to you as usual. More technical details can be found in Documentation/timers/NO_HZ.txt" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) sched: Keep at least 1 tick per second for active dynticks tasks rcu: Fix full dynticks' dependency on wide RCU nocb mode nohz: Protect smp_processor_id() in tick_nohz_task_switch() nohz_full: Add documentation. cputime_nsecs: use math64.h for nsec resolution conversion helpers nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config nohz: Reduce overhead under high-freq idling patterns nohz: Remove full dynticks' superfluous dependency on RCU tree nohz: Fix unavailable tick_stop tracepoint in dynticks idle nohz: Add basic tracing nohz: Select wide RCU nocb for full dynticks nohz: Disable the tick when irq resume in full dynticks CPU nohz: Re-evaluate the tick for the new task after a context switch nohz: Prepare to stop the tick on irq exit nohz: Implement full dynticks kick nohz: Re-evaluate the tick from the scheduler IPI sched: New helper to prevent from stopping the tick in full dynticks sched: Kick full dynticks CPU that have more than one task enqueued. perf: New helper to prevent full dynticks CPUs from stopping tick perf: Kick full dynticks CPU if events rotation is needed ...
2013-04-24nohz: Fix unavailable tick_stop tracepoint in dynticks idleFrederic Weisbecker
The trace_tick_stop() tracepoint is only available in full dynticks. But it's also used by dynticks-idle so let's build it for the latter config as well. This fixes: kernel/time/tick-sched.c: In function tick_nohz_stop_sched_tick: kernel/time/tick-sched.c:644: error: implicit declaration of function trace_tick_stop make[2]: *** [kernel/time/tick-sched.o] Erreur 1 Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-04-22nohz: Add basic tracingFrederic Weisbecker
It's not obvious to find out why the full dynticks subsystem doesn't always stop the tick: whether this is due to kthreads, posix timers, perf events, etc... These new tracepoints are here to help the user diagnose the failures and test this feature. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-03-18treewide: Fix typos in printk and commentMasanari Iida
Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-19tracing: Fix timer tracingArjan van de Ven
PowerTOP would like to be able to trace timers. Unfortunately, the current timer tracing is not very useful: the actual timer function is not recorded in the trace at the start of timer execution. Although this is recorded for timer "start" time (when it gets armed), this is not useful; most timers get started early, and a tracer like PowerTOP will never see this event, but will only see the actual running of the timer. This patch just adds the function to the timer tracing; I've verified with PowerTOP that now it can get useful information about timers. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: xiaoguangrong@cn.fujitsu.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> # .35.x, .34.x, .33.x LKML-Reference: <4C6C5FA9.3000405@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-28tracing: Convert some timer events to DEFINE_EVENTLi Zefan
Use DECLARE_EVENT_CLASS, and save ~2.3K: text data bss dec hex filename 7018823 2031888 7251132 16301843 f8bf13 vmlinux.o.orig 7016727 2031696 7251132 16299555 f8b623 vmlinux.o 5 events are converted: timer_class: timer_init, timer_expire_exit, timer_cancel hrtimer_class: hrtimer_init, hrtimer_cancel No change in functionality. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4BFA3773.3060200@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-10itimer: Fix the itimer trace print formatThomas Gleixner
Compiling powerpc64 results in: include/trace/events/timer.h:279: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'cputime_t' .... cputime_t on power is u64, which triggers the above warning. Cast the cputime_t to unsigned long long and fix the print format string. That works on both 32 and 64 bit architectures. While at it change the print format for long variables from %lu to %ld. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
2009-10-15events: Harmonize event field names and print output namesIngo Molnar
Now that we can filter based on fields via perf record, people will start using filter expressions and will expect them to be obvious. The primary way to see which fields are available is by looking at the trace output, such as: gcc-18676 [000] 343.011728: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.012727: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.032692: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.033690: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.034687: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.035686: irq_handler_entry: irq=0 handler=timer cc1-18677 [000] 343.036684: irq_handler_entry: irq=0 handler=timer While 'irq==0' filters work, the 'handler==<x>' filter expression does not work: $ perf record -R -f -a -e irq:irq_handler_entry --filter handler=timer sleep 1 Error: failed to set filter with 22 (Invalid argument) The problem is that while an 'irq' field exists and is recognized as a filter field - 'handler' does not exist - its name is 'name' in the output. To solve this, we need to synchronize the printout and the field names, wherever possible. In cases where the printout prints a non-field, we enclose that information in square brackets, such as: perf-1380 [013] 724.903505: softirq_exit: vec=9 [action=RCU] perf-1380 [013] 724.904482: softirq_exit: vec=1 [action=TIMER] This way users can use filter expressions more intuitively: all fields that show up as 'primary' (non-bracketed) information is filterable. This patch harmonizes the field names for all irq, bkl, power, sched and timer events. We might in fact think about dropping the print format bit of generic tracepoints altogether, and just print the fields that are being recorded. Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29itimers: Add tracepoints for itimerXiao Guangrong
Add tracepoints for all itimer variants: ITIMER_REAL, ITIMER_VIRTUAL and ITIMER_PROF. [ tglx: Fixed comments and made the output more readable, parseable and consistent. Replaced pid_vnr by pid_nr because the hrtimer callback can happen in any namespace ] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Anton Blanchard <anton@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Zhaolei <zhaolei@cn.fujitsu.com> LKML-Reference: <4A7F8B6E.2010109@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-29hrtimer: Add tracepoint for hrtimersXiao Guangrong
Add tracepoints which cover the life cycle of a hrtimer. The tracepoints are integrated with the already existing debug_object debug points as far as possible. [ tglx: Fixed comments, made output conistent, easier to read and parse. Fixed output for 32bit archs which do not use the scalar representation of ktime_t. Hand current time to trace_hrtimer_expiry_entry instead of calling get_time() inside of the trace assignment. ] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Anton Blanchard <anton@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Zhaolei <zhaolei@cn.fujitsu.com> LKML-Reference: <4A7F8B2B.5020908@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-29timers: Add tracepoints for timer_list timersXiao Guangrong
Add tracepoints which cover the timer life cycle. The tracepoints are integrated with the already existing debug_object debug points as far as possible. Based on patches from Mathieu: http://marc.info/?l=linux-kernel&m=123791201816247&w=2 and Anton: http://marc.info/?l=linux-kernel&m=124331396919301&w=2 [ tglx: Fixed timeout value in timer_start tracepoint, massaged comments and made the printk's more readable ] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Anton Blanchard <anton@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Peter Zijlstra <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Zhaolei <zhaolei@cn.fujitsu.com> LKML-Reference: <4A7F8A9B.3040201@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>