summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2013-05-16clocksource: Add module refcountThomas Gleixner
Add a module refcount, so the current clocksource cannot be removed unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.762417789@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Let timekeeping_notify return success/errorThomas Gleixner
timekeeping_notify() can fail due cs->enable() failure. Though the caller does not notice and happily keeps the wrong clocksource as the current one. Let the caller know about failure, so the current clocksource will be shown correctly in sysfs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.696321912@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-16clocksource: Always verify highres capabilityThomas Gleixner
If a clocksource has a (wrong) high rating, but can't be used as a timebase for oneshot tick mode, it is unconditionally selected even when the system is already in oneshot tick mode. This causes full system failure. Verify the clocksource selection against the oneshot mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130425143435.635040849@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-15workqueue: don't perform NUMA-aware allocations on offline nodes in ↵Tejun Heo
wq_numa_init() wq_numa_init() builds per-node cpumasks which are later used to make unbound workqueues NUMA-aware. The cpumasks are allocated using alloc_cpumask_var_node() for all possible nodes. Unfortunately, on machines with off-line nodes, this leads to NUMA-aware allocations on existing bug offline nodes, which in turn triggers BUG in the memory allocation code. Fix it by using NUMA_NO_NODE for cpumask allocations for offline nodes. kernel BUG at include/linux/gfp.h:323! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011 task: ffff880234608000 ti: ffff880234602000 task.ti: ffff880234602000 RIP: 0010:[<ffffffff8117495d>] [<ffffffff8117495d>] new_slab+0x2ad/0x340 RSP: 0000:ffff880234603bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880237404b40 RCX: 00000000000000d0 RDX: 0000000000000001 RSI: 0000000000000003 RDI: 00000000002052d0 RBP: ffff880234603c28 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: ffffffff812e3aa8 R12: 0000000000000001 R13: ffff8802378161c0 R14: 0000000000030027 R15: 00000000000040d0 FS: 0000000000000000(0000) GS:ffff880237800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88043fdff000 CR3: 00000000018d5000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880234603c28 0000000000000001 00000000000000d0 ffff8802378161c0 ffff880237404b40 ffff880237404b40 ffff880234603d28 ffffffff815edba1 ffff880237816140 0000000000000000 ffff88023740e1c0 Call Trace: [<ffffffff815edba1>] __slab_alloc+0x330/0x4f2 [<ffffffff81174b25>] kmem_cache_alloc_node_trace+0xa5/0x200 [<ffffffff812e3aa8>] alloc_cpumask_var_node+0x28/0x90 [<ffffffff81a0bdb3>] wq_numa_init+0x10d/0x1be [<ffffffff81a0bec8>] init_workqueues+0x64/0x341 [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff819f1f31>] kernel_init_freeable+0xb7/0x1ec [<ffffffff815d50de>] kernel_init+0xe/0xf0 [<ffffffff815ff89c>] ret_from_fork+0x7c/0xb0 Code: 45 84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff <0f> 0b 4c 8b 73 38 44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-Tested-by: Lingzhu Xiang <lxiang@redhat.com>
2013-05-15Merge tag 'trace-fixes-v3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes a fix to a memory leak when adding filters to traces. Also, Masami Hiramatsu fixed up some minor bugs that were discovered by sparse." * tag 'trace-fixes-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Make print_*probe_event static tracing/kprobes: Fix a sparse warning for incorrect type in assignment tracing/kprobes: Use rcu_dereference_raw for tp->files tracing: Fix leaks of filter preds
2013-05-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: - Fix for a task exit cleanup race caused by a missing a preempt disable - Cleanup of the event notification functions with a massive reduction of duplicated code * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Factor out auxiliary events notification perf: Fix EXIT event notification
2013-05-15Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Cure for not using zalloc in the first place, which leads to random crashes with CPUMASK_OFF_STACK. - Revert a user space visible change which broke udev - Add a missing cpu_online early return introduced by the new full dyntick conversions - Plug a long standing race in the timer wheel cpu hotplug code. Sigh... - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu up. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline tick: Cleanup NOHZ per cpu data on cpu down tick: Use zalloc_cpumask_var for allocating offstack cpumasks
2013-05-15Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - Two fixlets for the fallout of the generic idle task conversion - Documentation update * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit idle: Fix hlt/nohlt command-line handling in new generic idle kthread: Document ways of reducing OS jitter due to per-CPU kthreads
2013-05-15tracing/kprobes: Make print_*probe_event staticMasami Hiramatsu
According to sparse warning, print_*probe_event static because those functions are not directly called from outside. Link: http://lkml.kernel.org/r/20130513115839.6545.83067.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-15tracing/kprobes: Fix a sparse warning for incorrect type in assignmentMasami Hiramatsu
Fix a sparse warning about the rcu operated pointer is defined without __rcu address space. Link: http://lkml.kernel.org/r/20130513115837.6545.23322.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-15tracing/kprobes: Use rcu_dereference_raw for tp->filesMasami Hiramatsu
Use rcu_dereference_raw() for accessing tp->files. Because the write-side uses rcu_assign_pointer() for memory barrier, the read-side also has to use rcu_dereference_raw() with read memory barrier. Link: http://lkml.kernel.org/r/20130513115834.6545.17022.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-15tracing: Fix leaks of filter predsSteven Rostedt (Red Hat)
Special preds are created when folding a series of preds that can be done in serial. These are allocated in an ops field of the pred structure. But they were never freed, causing memory leaks. This was discovered using the kmemleak checker: unreferenced object 0xffff8800797fd5e0 (size 32): comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s) hex dump (first 32 bytes): 00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814b52af>] kmemleak_alloc+0x73/0x98 [<ffffffff8111ff84>] kmemleak_alloc_recursive.constprop.42+0x16/0x18 [<ffffffff81120e68>] __kmalloc+0xd7/0x125 [<ffffffff810d47eb>] kcalloc.constprop.24+0x2d/0x2f [<ffffffff810d4896>] fold_pred_tree_cb+0xa9/0xf4 [<ffffffff810d3781>] walk_pred_tree+0x47/0xcc [<ffffffff810d5030>] replace_preds.isra.20+0x6f8/0x72f [<ffffffff810d50b5>] create_filter+0x4e/0x8b [<ffffffff81b1c30d>] ftrace_test_event_filter+0x5a/0x155 [<ffffffff8100028d>] do_one_initcall+0xa0/0x137 [<ffffffff81afbedf>] kernel_init_freeable+0x14d/0x1dc [<ffffffff814b24b7>] kernel_init+0xe/0xdb [<ffffffff814d539c>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff Cc: Tom Zanussi <tzanussi@gmail.com> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-15rcu: Don't allocate bootmem from rcu_init()Sasha Levin
When rcu_init() is called we already have slab working, allocating bootmem at that point results in warnings and an allocation from slab. This commit therefore changes alloc_bootmem_cpumask_var() to alloc_cpumask_var() in rcu_bootup_announce_oddness(), which is called from rcu_init(). Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Robin Holt <holt@sgi.com> [paulmck: convert to zalloc_cpumask_var(), as suggested by Yinghai Lu.]
2013-05-15Add wait_on_atomic_t() and wake_up_atomic_t()David Howells
Add wait_on_atomic_t() and wake_up_atomic_t() to indicate became-zero events on atomic_t types. This uses the bit-wake waitqueue table. The key is set to a value outside of the number of bits in a long so that wait_on_bit() won't be woken up accidentally. What I'm using this for is: in a following patch I add a counter to struct fscache_cookie to count the number of outstanding operations that need access to netfs data. The way this works is: (1) When a cookie is allocated, the counter is initialised to 1. (2) When an operation wants to access netfs data, it calls atomic_inc_unless() to increment the counter before it does so. If it was 0, then the counter isn't incremented, the operation isn't permitted to access the netfs data (which might by this point no longer exist) and the operation aborts in some appropriate manner. (3) When an operation finishes with the netfs data, it decrements the counter and if it reaches 0, calls wake_up_atomic_t() on it - the assumption being that it was the last blocker. (4) When a cookie is released, the counter is decremented and the releaser uses wait_on_atomic_t() to wait for the counter to become 0 - which should indicate no one is using the netfs data any longer. The netfs data can then be destroyed. There are some alternatives that I have thought of and that have been suggested by Tejun Heo: (A) Using wait_on_bit() to wait on a bit in the counter. This doesn't work because if that bit happens to be 0 then the wait won't happen - even if the counter is non-zero. (B) Using wait_on_bit() to wait on a flag elsewhere which is cleared when the counter reaches 0. Such a flag would be redundant and would add complexity. (C) Adding a waitqueue to fscache_cookie - this would expand that struct by several words for an event that happens just once in each cookie's lifetime. Further, cookies are generally per-file so there are likely to be a lot of them. (D) Similar to (C), but add a pointer to a waitqueue in the cookie instead of a waitqueue. This would add single word per cookie and so would be less of an expansion - but still an expansion. (E) Adding a static waitqueue to the fscache module. Generally this would be fine, but under certain circumstances many cookies will all get added at the same time (eg. NFS umount, cache withdrawal) thereby presenting scaling issues. Note that the wait may be significant as disk I/O may be in progress. So, I think reusing the wait_on_bit() waitqueue set is reasonable. I don't make much use of the waitqueue I need on a per-cookie basis, but sometimes I have a huge flood of the cookies to deal with. I also don't want to add a whole new set of global waitqueue tables specifically for the dec-to-0 event if I can reuse the bit tables. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-05-14time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitonsJohn Stultz
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: Kay Sievers <kay@vrfy.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> #3.9 Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14workqueue: Make schedule_work() available again to non GPL modulesMarc Dionne
Commit 8425e3d5bdbe ("workqueue: inline trivial wrappers") changed schedule_work() and schedule_delayed_work() to inline wrappers, but these rely on some symbols that are EXPORT_SYMBOL_GPL, while the original functions were EXPORT_SYMBOL. This has the effect of changing the licensing requirement for these functions and making them unavailable to non GPL modules. Make them available again by removing the restriction on the required symbols. Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14workqueue: correct handling of the pool spin_lockJoonsoo Kim
When we fail to mutex_trylock(), we release the pool spin_lock and do mutex_lock(). After that, we should regrab the pool spin_lock, but, regrabbing is missed in current code. So correct it. Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14cgroup: implement task_cgroup_path_from_hierarchy()Tejun Heo
kdbus folks want a sane way to determine the cgroup path that a given task belongs to on a given hierarchy, which is a reasonble thing to expect from cgroup core. Implement task_cgroup_path_from_hierarchy(). v2: Dropped unnecessary NULL check on the return value of task_cgroup_from_root() as suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <greg@kroah.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Lennart Poettering <lennart@poettering.net> Cc: Daniel Mack <daniel@zonque.org>
2013-05-14cgroup: make hierarchy_id use cyclic idrTejun Heo
We want to be able to lookup a hierarchy from its id and cyclic allocation is a whole lot simpler with idr. Convert to idr and use idr_alloc_cyclc(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14cgroup: drop hierarchy_id_lockTejun Heo
Now that hierarchy_id alloc / free are protected by the cgroup mutexes, there's no need for this separate lock. Drop it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14cgroup: refactor hierarchy_id handlingTejun Heo
We're planning to converting hierarchy_ida to an idr and use it to look up hierarchy from its id. As we want the mapping to happen atomically with cgroupfs_root registration, this patch refactors hierarchy_id init / exit so that ida operations happen inside cgroup_[root_]mutex. * s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or -errno like a normal function. * Move hierarchy_id initialization from cgroup_root_from_opts() into cgroup_mount() block where the root is confirmed to be used and being registered while holding both mutexes. * Split cgroup_drop_id() into cgroup_exit_root_id() and cgroup_free_root(), so that ID release can happen before dropping the mutexes in cgroup_kill_sb(). The latter expects hierarchy_id to be exited before being invoked. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14rcu: Fix comparison sense in rcu_needs_cpu()Paul E. McKenney
Commit c0f4dfd4f (rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks) introduced a bug that can result in excessively long grace periods. This bug reverse the senes of the "if" statement checking for lazy callbacks, so that RCU takes a lazy approach when there are in fact non-lazy callbacks. This can result in excessive boot, suspend, and resume times. This commit therefore fixes the sense of this "if" statement. Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Bjørn Mork <bjorn@mork.no> Reported-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Bjørn Mork <bjorn@mork.no> Tested-by: Joerg Roedel <joro@8bytes.org>
2013-05-14workqueue: Add system wide power_efficient workqueuesViresh Kumar
This patch adds system wide workqueues aligned towards power saving. This is done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to 'true'. tj: updated comments a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueuesViresh Kumar
Workqueues can be performance or power-oriented. Currently, most workqueues are bound to the CPU they were created on. This gives good performance (due to cache effects) at the cost of potentially waking up otherwise idle cores (Idle from scheduler's perspective. Which may or may not be physically idle) just to process some work. To save power, we can allow the work to be rescheduled on a core that is already awake. Workqueues created with the WQ_UNBOUND flag will allow some power savings. However, we don't change the default behaviour of the system. To enable power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to be turned on. This option can also be overridden by the workqueue.power_efficient boot parameter. tj: Updated config description and comments. Renamed CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14Merge branch 'for-3.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "A fix for a workqueue_congested() regression that broke fscache" * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number
2013-05-14timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARETirupathi Reddy
An inactive timer's base can refer to a offline cpu's base. In the current code, cpu_base's lock is blindly reinitialized each time a CPU is brought up. If a CPU is brought online during the period that another thread is trying to modify an inactive timer on that CPU with holding its timer base lock, then the lock will be reinitialized under its feet. This leads to following SPIN_BUG(). <0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466 <0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1 <4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) <4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30) <4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310) <4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) <4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) <4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48) <4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0) <4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) <4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4) <4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484) <4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0) <4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c) <4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8) As an example, this particular crash occurred when CPU #3 is executing mod_timer() on an inactive timer whose base is refered to offlined CPU #2. The code locked the timer_base corresponding to CPU #2. Before it could proceed, CPU #2 came online and reinitialized the spinlock corresponding to its base. Thus now CPU #3 held a lock which was reinitialized. When CPU #3 finally ended up unlocking the old cpu_base corresponding to CPU #2, we hit the above SPIN_BUG(). CPU #0 CPU #3 CPU #2 ------ ------- ------- ..... ...... <Offline> mod_timer() lock_timer_base spin_lock_irqsave(&base->lock) cpu_up(2) ..... ...... init_timers_cpu() .... ..... spin_lock_init(&base->lock) ..... spin_unlock_irqrestore(&base->lock) ...... <spin_bug> Allocation of per_cpu timer vector bases is done only once under "tvec_base_done[]" check. In the current code, spinlock_initialization of base->lock isn't under this check. When a CPU is up each time the base lock is reinitialized. Move base spinlock initialization under the check. Signed-off-by: Tirupathi Reddy <tirupath@codeaurora.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exitSrivatsa S. Bhat
Bjørn Mork reported the following warning when running powertop. [ 49.289034] ------------[ cut here ]------------ [ 49.289055] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common.isra.48+0x3d/0x125() [ 49.289244] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-bisect-rcu-warn+ #107 [ 49.289251] ffffffff8157d8c8 ffffffff81801e28 ffffffff8137e4e3 ffffffff81801e68 [ 49.289260] ffffffff8103094f ffffffff81801e68 0000000000000000 ffff88023afcd9b0 [ 49.289268] 0000000000000000 0140000000000000 ffff88023bee7700 ffffffff81801e78 [ 49.289276] Call Trace: [ 49.289285] [<ffffffff8137e4e3>] dump_stack+0x19/0x1b [ 49.289293] [<ffffffff8103094f>] warn_slowpath_common+0x62/0x7b [ 49.289300] [<ffffffff8103097d>] warn_slowpath_null+0x15/0x17 [ 49.289306] [<ffffffff810a9006>] rcu_eqs_exit_common.isra.48+0x3d/0x125 [ 49.289314] [<ffffffff81079b49>] ? trace_hardirqs_off_caller+0x37/0xa6 [ 49.289320] [<ffffffff810a9692>] rcu_idle_exit+0x85/0xa8 [ 49.289327] [<ffffffff8107076e>] trace_cpu_idle_rcuidle+0xae/0xff [ 49.289334] [<ffffffff810708b1>] cpu_startup_entry+0x72/0x115 [ 49.289341] [<ffffffff813689e5>] rest_init+0x149/0x150 [ 49.289347] [<ffffffff8136889c>] ? csum_partial_copy_generic+0x16c/0x16c [ 49.289355] [<ffffffff81a82d34>] start_kernel+0x3f0/0x3fd [ 49.289362] [<ffffffff81a8274c>] ? repair_env_string+0x5a/0x5a [ 49.289368] [<ffffffff81a82481>] x86_64_start_reservations+0x2a/0x2c [ 49.289375] [<ffffffff81a82550>] x86_64_start_kernel+0xcd/0xd1 [ 49.289379] ---[ end trace 07a1cc95e29e9036 ]--- The warning is that 'rdtp->dynticks' has an unexpected value, which roughly translates to - the calls to rcu_idle_enter() and rcu_idle_exit() were not made in the correct order, or otherwise messed up. And Bjørn's painstaking debugging indicated that this happens when the idle loop enters the poll mode. Looking at the poll function cpu_idle_poll(), and the implementation of trace_cpu_idle_rcuidle(), the problem becomes very clear: cpu_idle_poll() lacks calls to rcu_idle_enter/exit(), and trace_cpu_idle_rcuidle() calls them in the reverse order - first rcu_idle_exit(), and then rcu_idle_enter(). Hence the even/odd alternative sequencing of rdtp->dynticks goes for a toss. And powertop readily triggers this because powertop uses the idle-tracing infrastructure extensively. So, to fix this, wrap the code in cpu_idle_poll() within rcu_idle_enter/exit(), so that it blends properly with the calls inside trace_cpu_idle_rcuidle() and thus get the function ordering right. Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/519169BF.4080208@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offlineThomas Gleixner
commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict idle logic) moved code out of tick_nohz_stop_sched_tick() and missed to bail out when the cpu is offline. That's causing subsequent failures as an offline CPU is supposed to die and not to fiddle with nohz magic. Return false in can_stop_idle_tick() if the cpu is offline. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14cgroup: initialize xattr before calling d_instantiate()Li Zefan
cgroup_create_file() calls d_instantiate(), which may decide to look at the xattrs on the file. Smack always does this and SELinux can be configured to do so. But cgroup_add_file() didn't initialize xattrs before calling cgroup_create_file(), which finally leads to dereferencing NULL dentry->d_fsdata. This bug has been there since cgroup xattr was introduced. Cc: <stable@vger.kernel.org> # 3.8.x Reported-by: Ivan Bulatovic <combuster@archlinux.us> Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-12sigtimedwait: use freezable blocking callColin Cross
Avoid waking up every thread sleeping in a sigtimedwait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12nanosleep: use freezable blocking callColin Cross
Avoid waking up every thread sleeping in a nanosleep call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12futex: use freezable blocking callColin Cross
Avoid waking up every thread sleeping in a futex_wait call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12freezer: skip waking up tasks with PF_FREEZER_SKIP setColin Cross
Android goes through suspend/resume very often (every few seconds when on a busy wifi network with the screen off), and a significant portion of the energy used to go in and out of suspend is spent in the freezer. If a task has called freezer_do_not_count(), don't bother waking it up. If it happens to wake up later it will call freezer_count() and immediately enter the refrigerator. Combined with patches to convert freezable helpers to use freezer_do_not_count() and convert common sites where idle userspace tasks are blocked to use the freezable helpers, this reduces the time and energy required to suspend and resume. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12freezer: shorten freezer sleep time using exponential backoffColin Cross
All tasks can easily be frozen in under 10 ms, switch to using an initial 1 ms sleep followed by exponential backoff until 8 ms. Also convert the printed time to ms instead of centiseconds. Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12lockdep: remove task argument from debug_check_no_locks_heldColin Cross
The only existing caller to debug_check_no_locks_held calls it with 'current' as the task, and the freezer needs to call debug_check_no_locks_held but doesn't already have a current task pointer, so remove the argument. It is already assuming that the current task is relevant by dumping the current stack trace as part of the warning. This was originally part of 6aa9707099c (lockdep: check that no locks held at freeze time) which was reverted in dbf520a9d7d4. Original-author: Mandeep Singh Baines <msb@chromium.org> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12tick: Cleanup NOHZ per cpu data on cpu downThomas Gleixner
Prarit reported a crash on CPU offline/online. The reason is that on CPU down the NOHZ related per cpu data of the dead cpu is not cleaned up. If at cpu online an interrupt happens before the per cpu tick device is registered the irq_enter() check potentially sees stale data and dereferences a NULL pointer. Cleanup the data after the cpu is dead. Reported-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Cc: Mike Galbraith <bitbucket@online.de> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-11Merge tag 'trace-fixes-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing/kprobes update from Steven Rostedt: "The majority of these changes are from Masami Hiramatsu bringing kprobes up to par with the latest changes to ftrace (multi buffering and the new function probes). He also discovered and fixed some bugs in doing so. When pulling in his patches, I also found a few minor bugs as well and fixed them. This also includes a compile fix for some archs that select the ring buffer but not tracing. I based this off of the last patch you took from me that fixed the merge conflict error, as that was the commit that had all the changes I needed for this set of changes." * tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Support soft-mode disabling tracing/kprobes: Support ftrace_event_file base multibuffer tracing/kprobes: Pass trace_probe directly from dispatcher tracing/kprobes: Increment probe hit-count even if it is used by perf tracing/kprobes: Use bool for retprobe checker ftrace: Fix function probe when more than one probe is added ftrace: Fix the output of enabled_functions debug file ftrace: Fix locking in register_ftrace_function_probe() tracing: Add helper function trace_create_new_event() to remove duplicate code tracing: Modify soft-mode only if there's no other referrer tracing: Indicate enabled soft-mode in enable file tracing/kprobes: Fix to increment return event probe hit-count ftrace: Cleanup regex_lock and ftrace_lock around hash updating ftrace, kprobes: Fix a deadlock on ftrace_regex_lock ftrace: Have ftrace_regex_write() return either read or error tracing: Return error if register_ftrace_function_probe() fails for event_enable_func() tracing: Don't succeed if event_enable_func did not register anything ring-buffer: Select IRQ_WORK
2013-05-11Merge git://git.infradead.org/users/eparis/auditLinus Torvalds
Pull audit changes from Eric Paris: "Al used to send pull requests every couple of years but he told me to just start pushing them to you directly. Our touching outside of core audit code is pretty straight forward. A couple of interface changes which hit net/. A simple argument bug calling audit functions in namei.c and the removal of some assembly branch prediction code on ppc" * git://git.infradead.org/users/eparis/audit: (31 commits) audit: fix message spacing printing auid Revert "audit: move kaudit thread start from auditd registration to kaudit init" audit: vfs: fix audit_inode call in O_CREAT case of do_last audit: Make testing for a valid loginuid explicit. audit: fix event coverage of AUDIT_ANOM_LINK audit: use spin_lock in audit_receive_msg to process tty logging audit: do not needlessly take a lock in tty_audit_exit audit: do not needlessly take a spinlock in copy_signal audit: add an option to control logging of passwords with pam_tty_audit audit: use spin_lock_irqsave/restore in audit tty code helper for some session id stuff audit: use a consistent audit helper to log lsm information audit: push loginuid and sessionid processing down audit: stop pushing loginid, uid, sessionid as arguments audit: remove the old depricated kernel interface audit: make validity checking generic audit: allow checking the type of audit message in the user filter audit: fix build break when AUDIT_DEBUG == 2 audit: remove duplicate export of audit_enabled Audit: do not print error when LSMs disabled ...
2013-05-10workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into ↵Tejun Heo
node number df2d5ae499 ("workqueue: map an unbound workqueues to multiple per-node pool_workqueues") made unbound workqueues to map to multiple per-node pool_workqueues and accordingly updated workqueue_contested() so that, for unbound workqueues, it maps the specified @cpu to the NUMA node number to obtain the matching pool_workqueue to query the congested state. Before this change, workqueue_congested() ignored @cpu for unbound workqueues as there was only one pool_workqueue and some users (fscache) called it with WORK_CPU_UNBOUND. After the commit, this causes the following oops as WORK_CPU_UNBOUND gets translated to garbage by cpu_to_node(). BUG: unable to handle kernel paging request at ffff8803598d98b8 IP: [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa PGD 2421067 PUD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 2689 Comm: cat Tainted: GF 3.9.0-fsdevel+ #4 task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000 RIP: 0010:[<ffffffff81043b7e>] [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa RSP: 0018:ffff880025807ad8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003 RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0 RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898 R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97 R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0 FS: 00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0 ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8 Call Trace: [<ffffffff810442ce>] workqueue_congested+0xb7/0x12c [<ffffffffa00810b0>] fscache_enqueue_object+0xb2/0xe8 [fscache] [<ffffffffa007facd>] __fscache_acquire_cookie+0x3b9/0x56c [fscache] [<ffffffffa00ad8fe>] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs] [<ffffffffa009e112>] do_open+0x9/0xd [nfs] [<ffffffff810e804a>] do_dentry_open+0x175/0x24b [<ffffffff810e8298>] finish_open+0x41/0x51 Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Howells <dhowells@redhat.com> Tested-and-Acked-by: David Howells <dhowells@redhat.com>
2013-05-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull stray syscall bits from Al Viro: "Several syscall-related commits that were missing from the original" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE unicore32: just use mmap_pgoff()... unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10Merge tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6Linus Torvalds
Pull misc fixes from David Woodhouse: "This is some miscellaneous cleanups that don't really belong anywhere else (or were ignored), that have been sitting in linux-next for some time. Two of them are fixes resulting from my audit of krealloc() usage that don't seem to have elicited any response when I posted them, and the other three are patches from Artem removing dead code." * tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6: pcmcia: remove RPX board stuff m68k: remove rpxlite stuff pcmcia: remove Motorola MBX860 support params: Fix potential memory leak in add_sysfs_param() dell-laptop: Fix krealloc() misuse in parse_da_table()
2013-05-10sched: Use this_rq() helperNathan Zimmer
It is a few instructions more efficent to and slightly more readable to use this_rq()-> instead of cpu_rq(smp_processor_id())-> . Size comparison of kernel/sched/fair.o: text data bss dec hex filename 27972 122 26 28120 6dd8 fair.o.before 27956 122 26 28104 6dc8 fair.o.after Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1368116643-87971-1-git-send-email-nzimmer@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-09tracing/kprobes: Support soft-mode disablingMasami Hiramatsu
Support soft-mode disabling on kprobe-based dynamic events. Soft-disabling is just ignoring recording if the soft disabled flag is set. Link: http://lkml.kernel.org/r/20130509054454.30398.7237.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09tracing/kprobes: Support ftrace_event_file base multibufferMasami Hiramatsu
Support multi-buffer on kprobe-based dynamic events by using ftrace_event_file. Link: http://lkml.kernel.org/r/20130509054449.30398.88343.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09tracing/kprobes: Pass trace_probe directly from dispatcherMasami Hiramatsu
Pass the pointer of struct trace_probe directly from probe dispatcher to handlers. This removes redundant container_of macro uses. Same thing has already done in trace_uprobe. Link: http://lkml.kernel.org/r/20130509054441.30398.69112.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09tracing/kprobes: Increment probe hit-count even if it is used by perfMasami Hiramatsu
Increment probe hit-count for profiling even if it is used by perf tool. Same thing has already done in trace_uprobe. Link: http://lkml.kernel.org/r/20130509054436.30398.21133.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09tracing/kprobes: Use bool for retprobe checkerMasami Hiramatsu
Use bool instead of int for kretprobe checker. Link: http://lkml.kernel.org/r/20130509054431.30398.38561.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09ftrace: Fix function probe when more than one probe is addedSteven Rostedt (Red Hat)
When the first function probe is added and the function tracer is updated the functions are modified to call the probe. But when a second function is added, it updates the function records to have the second function also update, but it fails to update the actual function itself. This prevents the second (or third or forth and so on) probes from having their functions called. # echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter # echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter # cat trace # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | # touch /tmp/a # rm /tmp/a # cat trace # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | # ln -s /tmp/a # cat trace # tracer: nop # # entries-in-buffer/entries-written: 414/414 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] d..3 2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120 <...>-3114 [001] d..4 2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120 bash-2786 [000] d..3 2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120 kworker/0:1-34 [000] d..3 2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 <idle>-0 [002] d..3 2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120 sshd-2783 [002] d..3 2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120 Still need to update the functions even though the probe itself does not need to be registered again when added a new probe. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09ftrace: Fix the output of enabled_functions debug fileSteven Rostedt (Red Hat)
The enabled_functions debugfs file was created to be able to see what functions have been modified from nops to calling a tracer. The current method uses the counter in the function record. As when a ftrace_ops is registered to a function, its count increases. But that doesn't mean that the function is actively being traced. /proc/sys/kernel/ftrace_enabled can be set to zero which would disable it, as well as something can go wrong and we can think its enabled when only the counter is set. The record's FTRACE_FL_ENABLED flag is set or cleared when its function is modified. That is a much more accurate way of knowing what function is enabled or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09ftrace: Fix locking in register_ftrace_function_probe()Steven Rostedt (Red Hat)
The iteration of the ftrace function list and the call to ftrace_match_record() need to be protected by the ftrace_lock. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>