summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2013-12-05cpuset: convert away from cftype->read()Tejun Heo
In preparation of conversion to kernfs, cgroup file handling is being consolidated so that it can be easily mapped to the seq_file based interface of kernfs. All users of cftype->read() can be easily served, usually better, by seq_file and other methods. Rename cpuset_common_file_read() to cpuset_common_read_seq_string() and convert it to use read_seq_string() interface instead. This not only simplifies the code but also makes it more versatile. Before, the file couldn't output if the result is longer than PAGE_SIZE. After the conversion, seq_file automatically grows the buffer until the output can fit. This patch doesn't make any visible behavior changes except for being able to handle output larger than PAGE_SIZE. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-12-05cgroup, sched: convert away from cftype->read_map()Tejun Heo
In preparation of conversion to kernfs, cgroup file handling is being consolidated so that it can be easily mapped to the seq_file based interface of kernfs. cftype->read_map() doesn't add any value and being replaced with ->read_seq_string(). Update cpu_stats_show() and cpuacct_stats_show() accordingly. This patch doesn't make any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-05padata: Fix wrong usage of rcu_dereference()Mathias Krause
A kernel with enabled lockdep complains about the wrong usage of rcu_dereference() under a rcu_read_lock_bh() protected region. =============================== [ INFO: suspicious RCU usage. ] 3.13.0-rc1+ #126 Not tainted ------------------------------- linux/kernel/padata.c:115 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by cryptomgr_test/153: #0: (rcu_read_lock_bh){.+....}, at: [<ffffffff8115c235>] padata_do_parallel+0x5/0x270 Fix that by using rcu_dereference_bh() instead. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-05sched/numa: Drop idx field of task_numa_env structWanpeng Li
Drop unused idx field of task_numa_env struct. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1386241817-5051-2-git-send-email-liwanp@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - timekeeping: Cure a subtle drift issue on GENERIC_TIME_VSYSCALL_OLD - nohz: Make CONFIG_NO_HZ=n and nohz=off command line option behave the same way. Fixes a long standing load accounting wreckage. - clocksource/ARM: Kconfig update to avoid ARM=n wreckage - clocksource/ARM: Fixlets for the AT91 and SH clocksource/clockevents - Trivial documentation update and kzalloc conversion from akpms pile * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD clocksource: arm_arch_timer: Hide eventstream Kconfig on non-ARM clocksource: sh_tmu: Add clk_prepare/unprepare support clocksource: sh_tmu: Release clock when sh_tmu_register() fails clocksource: sh_mtu2: Add clk_prepare/unprepare support clocksource: sh_mtu2: Release clock when sh_mtu2_register() fails ARM: at91: rm9200: switch back to clockevents_config_and_register tick: Document tick_do_timer_cpu timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) NOHZ: Check for nohz active instead of nohz enabled
2013-12-04Merge branch 'timers/core-v2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Pull dynticks updates from Frederic Weisbecker: * Fix a bug where posix cpu timers requeued due to interval got ignored on full dynticks CPUs (not a regression though as it only impacts full dynticks and the bug is there since we merged full dynticks). * Optimizations and cleanups on the use of per CPU APIs to improve code readability, performance and debuggability in the nohz subsystem; * Optimize posix cpu timer by sparing stub workqueue queue with full dynticks off case * Rename some functions to extend with *_this_cpu() suffix for clarity * Refine the naming of some context tracking subsystem state accessors * Trivial spelling fix by Paul Gortmaker Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-04params: improve standard definitionsFelipe Contreras
We are repeating the functionality of kstrtol in param_set_long, and the same for kstrtoint. We can get rid of the extra code by using the right functions. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-12-03rcu: Let the world know when RCU adjusts its geometryPaul E. McKenney
Some RCU bugs have been specific to the layout of the rcu_node tree, but RCU will silently adjust the tree at boot time if appropriate. This obscures valuable debugging information, so print a message when this happens. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03rcu: Fix srcu_barrier() docbook headerPaul E. McKenney
The srcu_barrier() docbook header left out the "sp" argument, so this commit adds that argument's docbook text. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03rcu: Allow task-level idle entry/exit nestingPaul E. McKenney
The current task-level idle entry/exit code forces an entry/exit on each call, regardless of the nesting level. This commit therefore properly accounts for nesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-12-03rcu: Break call_rcu() deadlock involving scheduler and perfPaul E. McKenney
Dave Jones got the following lockdep splat: > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.12.0-rc3+ #92 Not tainted > ------------------------------------------------------- > trinity-child2/15191 is trying to acquire lock: > (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50 > > but task is already holding lock: > (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #3 (&ctx->lock){-.-...}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0 > [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0 > [<ffffffff81732052>] __schedule+0x1d2/0xa20 > [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0 > [<ffffffff817352b6>] retint_kernel+0x26/0x30 > [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50 > [<ffffffff813f0504>] pty_write+0x54/0x60 > [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0 > [<ffffffff813e5838>] tty_write+0x158/0x2d0 > [<ffffffff811c4850>] vfs_write+0xc0/0x1f0 > [<ffffffff811c52cc>] SyS_write+0x4c/0xa0 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > -> #2 (&rq->lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0 > [<ffffffff81054336>] do_fork+0x126/0x460 > [<ffffffff81054696>] kernel_thread+0x26/0x30 > [<ffffffff8171ff93>] rest_init+0x23/0x140 > [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403 > [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c > [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4 > > -> #1 (&p->pi_lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff810979d1>] try_to_wake_up+0x31/0x350 > [<ffffffff81097d62>] default_wake_function+0x12/0x20 > [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40 > [<ffffffff8108ea38>] __wake_up_common+0x58/0x90 > [<ffffffff8108ff59>] __wake_up+0x39/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111b8d>] call_rcu+0x1d/0x20 > [<ffffffff81093697>] cpu_attach_domain+0x287/0x360 > [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0 > [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a > [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202 > [<ffffffff817200be>] kernel_init+0xe/0x190 > [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0 > > -> #0 (&rdp->nocb_wq){......}: > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > other info that might help us debug this: > > Chain exists of: > &rdp->nocb_wq --> &rq->lock --> &ctx->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&ctx->lock); > lock(&rq->lock); > lock(&ctx->lock); > lock(&rdp->nocb_wq); > > *** DEADLOCK *** > > 1 lock held by trinity-child2/15191: > #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > stack backtrace: > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92 > ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40 > ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0 > ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0 > Call Trace: > [<ffffffff8172a363>] dump_stack+0x4e/0x82 > [<ffffffff81726741>] print_circular_bug+0x200/0x20f > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60 > [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0 > [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 The underlying problem is that perf is invoking call_rcu() with the scheduler locks held, but in NOCB mode, call_rcu() will with high probability invoke the scheduler -- which just might want to use its locks. The reason that call_rcu() needs to invoke the scheduler is to wake up the corresponding rcuo callback-offload kthread, which does the job of starting up a grace period and invoking the callbacks afterwards. One solution (championed on a related problem by Lai Jiangshan) is to simply defer the wakeup to some point where scheduler locks are no longer held. Since we don't want to unnecessarily incur the cost of such deferral, the task before us is threefold: 1. Determine when it is likely that a relevant scheduler lock is held. 2. Defer the wakeup in such cases. 3. Ensure that all deferred wakeups eventually happen, preferably sooner rather than later. We use irqs_disabled_flags() as a proxy for relevant scheduler locks being held. This works because the relevant locks are always acquired with interrupts disabled. We may defer more often than needed, but that is at least safe. The wakeup deferral is tracked via a new field in the per-CPU and per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup. This flag is checked by the RCU core processing. The __rcu_pending() function now checks this flag, which causes rcu_check_callbacks() to initiate RCU core processing at each scheduling-clock interrupt where this flag is set. Of course this is not sufficient because scheduling-clock interrupts are often turned off (the things we used to be able to count on!). So the flags are also checked on entry to any state that RCU considers to be idle, which includes both NO_HZ_IDLE idle state and NO_HZ_FULL user-mode-execution state. This approach should allow call_rcu() to be invoked regardless of what locks you might be holding, the key word being "should". Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-03rcu: Fix and comment ordering around wait_event()Paul E. McKenney
It is all too easy to forget that wait_event() does not necessarily imply a full memory barrier. The case where it does not is where the condition transitions to true just as wait_event() starts execution. This is actually a feature: The standard use of wait_event() involves locking, in which case the locks provide the needed ordering (you hold a lock across the wake_up() and acquire that same lock after wait_event() returns). Given that I did forget that wait_event() does not necessarily imply a full memory barrier in one case, this commit fixes that case. This commit also adds comments calling out the placement of existing memory barriers relied on by wait_event() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03rcu: Kick CPU halfway to RCU CPU stall warningPaul E. McKenney
When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on itself. This can help move the grace period forward in some situations, but it would be even better to do this -before- the RCU CPU stall warning. This commit therefore causes resched_cpu() to be called every five jiffies once the system is halfway to an RCU CPU stall warning. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-02posix-timers: Fix full dynticks CPUs kick on timer reschedulingFrederic Weisbecker
A posix CPU timer can be rearmed while it is firing or after it is notified with a signal. This can happen for example with timers that were set with a non zero interval in timer_settime(). This rearming can happen in two places: 1) On timer firing time, which happens on the target's tick. If the timer can't trigger a signal because it is ignored, it reschedules itself to honour the timer interval. 2) On signal handling from the timer's notification target. This one can be a different task than the timer's target itself. Once the signal is notified, the notification target rearms the timer, again to honour the timer interval. When a timer is rearmed, we need to notify the full dynticks CPUs such that they restart their tick in case they are running tasks that may have a share in elapsing this timer. Now the 1st case above handles full dynticks CPUs with a call to posix_cpu_timer_kick_nohz() from the posix cpu timer firing code. But the second case ignores the fact that some CPUs may run non-idle tasks with their tick off. As a result, when a timer is resheduled after its signal notification, the full dynticks CPUs may completely ignore it and not tick on the timer as expected This patch fixes this bug by handling both cases in one. All we need is to move the kick to the rearming common code in posix_cpu_timer_schedule(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Olivier Langlois <olivier@olivierlanglois.net>
2013-12-02posix-timers: Spare workqueue if there is no full dynticks CPU to kickFrederic Weisbecker
After a posix cpu timer is set, a workqueue is scheduled in order to kick the full dynticks CPUs and let them restart their tick if necessary in case the task they are running is concerned by the new timer. This kick is implemented by way of IPIs, which require interrupts to be enabled, hence the need for a workqueue to raise them because the posix cpu timer set path has interrupts disabled. Now if there is no full dynticks CPU on the system, the workqueue is still scheduled but it simply won't send any IPI and return immediately. So lets spare that worqueue when it is not needed. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02context_tracking: Wrap static key check into more intuitive function nameFrederic Weisbecker
Use a function with a meaningful name to check the global context tracking state. static_key_false() is a bit confusing for reviewers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02nohz: Convert a few places to use local per cpu accessesFrederic Weisbecker
A few functions use remote per CPU access APIs when they deal with local values. Just do the right conversion to improve performance, code readability and debug checks. While at it, lets extend some of these function names with *_this_cpu() suffix in order to display their purpose more clearly. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - Correction of fuzzy and fragile IRQ_RETVAL macro - IRQ related resume fix affecting only XEN - ARM/GIC fix for chained GIC controllers * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Gic: fix boot for chained gics irq: Enable all irqs unconditionally in irq_resume genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
2013-12-02Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Various smaller fixlets, all over the place" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/doc: Fix generation of device-drivers sched: Expose preempt_schedule_irq() sched: Fix a trivial typo in comments sched: Remove unused variable in 'struct sched_domain' sched: Avoid NULL dereference on sd_busy sched: Check sched_domain before computing group power MAINTAINERS: Update file patterns in the lockdep and scheduler entries
2013-12-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel and tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools lib traceevent: Fix conversion of pointer to integer of different size perf/trace: Properly use u64 to hold event_id perf: Remove fragile swevent hlist optimization ftrace, perf: Avoid infinite event generation loop tools lib traceevent: Fix use of multiple options in processing field perf header: Fix possible memory leaks in process_group_desc() perf header: Fix bogus group name perf tools: Tag thread comm as overriden
2013-11-30PM / hibernate: export hibernation_set_opsLeonardo Potenza
To support the ability to implement PM hibernation code as modules the hibernation_set_ops function requires to be exported. Similar solution already available for suspend_set_ops (please refer to commit a5e4fd8783a2bec861ecf1138cdc042269ff59aa). Signed-off-by: Leonardo Potenza <leonardo.potenza@intel.com> Signed-off-by: Edwin Verplanke <edwin.verplanke@intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-29Merge branch 'for-3.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "This contains one important fix. The NUMA support added a while back broke ordering guarantees on ordered workqueues. It was enforced by having single frontend interface with @max_active == 1 but the NUMA support puts multiple interfaces on unbound workqueues on NUMA machines thus breaking the ordered guarantee. This is fixed by disabling NUMA support on ordered workqueues. The above and a couple other patches were sitting in for-3.12-fixes but I forgot to push that out, so they ended up waiting a bit too long. My aplogies. Other fixes are minor" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues workqueue: fix comment typo for __queue_work() workqueue: fix ordered workqueues in NUMA setups workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
2013-11-29Merge branch 'for-3.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Fixes for three issues. - cgroup destruction path could swamp system_wq possibly leading to deadlock. This actually seems to happen in the wild with memcg because memcg destruction path adds nested dependency on system_wq. Resolved by isolating cgroup destruction work items on its dedicated workqueue. - Possible locking context deadlock through seqcount reported by lockdep - Memory leak under certain conditions" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix cgroup_subsys_state leak for seq_files cpuset: Fix memory allocator deadlock cgroup: use a dedicated workqueue for cgroup destruction
2013-11-29cgroup: don't guarantee cgroup.procs is sorted if sane_behaviorTejun Heo
For some reason, tasks and cgroup.procs guarantee that the result is sorted. This is the only reason this whole pidlist logic is necessary instead of just iterating through sorted member tasks. We can't do anything about the existing interface but at least ensure that such expectation doesn't exist for the new interface so that pidlist logic may be removed in the distant future. This patch scrambles the sort order if sane_behavior so that the output is usually not sorted in the new interface. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: remove cgroup_pidlist->use_countTejun Heo
After the recent changes, pidlist ref is held only between cgroup_pidlist_start() and cgroup_pidlist_stop() during which cgroup->pidlist_mutex is also held. IOW, the reference count is redundant now. While in use, it's always one and pidlist_mutex is held - holding the mutex has exactly the same protection. This patch collapses destroy_dwork queueing into cgroup_pidlist_stop() so that pidlist_mutex is not released inbetween and drops pidlist->use_count. This patch shouldn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: load and release pidlists from seq_file start and stop respectivelyTejun Heo
Currently, pidlists are reference counted from file open and release methods. This means that holding onto an open file may waste memory and reads may return data which is very stale. Both aren't critical because pidlists are keyed and shared per namespace and, well, the user isn't supposed to have large delay between open and reads. cgroup is planned to be converted to use kernfs and it'd be best if we can stick to just the seq_file operations - start, next, stop and show. This can be achieved by loading pidlist on demand from start and release with time delay from stop, so that consecutive reads don't end up reloading the pidlist on each iteration. This would remove the need for hooking into open and release while also avoiding issues with holding onto pidlist for too long. The previous patches implemented delayed release and restructured pidlist handling so that pidlists can be loaded and released from seq_file start / stop. This patch actually moves pidlist load to start and release to stop. This means that pidlist is pinned only between start and stop and may go away between two consecutive read calls if the two calls are apart by more than CGROUP_PIDLIST_DESTROY_DELAY. cgroup_pidlist_start() thus can't re-use the stored cgroup_pid_list_open_file->pidlist directly. During start, it's only used as a hint indicating whether this is the first start after open or not and pidlist is always looked up or created. pidlist_mutex locking and reference counting are moved out of pidlist_array_load() so that pidlist_array_load() can perform lookup and creation atomically. While this enlarges the area covered by pidlist_mutex, given how the lock is used, it's highly unlikely to be noticeable. v2: Refreshed on top of the updated "cgroup: introduce struct cgroup_pidlist_open_file". Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: remove cgroup_pidlist->rwsemTejun Heo
cgroup_pidlist locking is needlessly complicated. It has outer cgroup->pidlist_mutex to protect the list of pidlists associated with a cgroup and then each pidlist has rwsem to synchronize updates and reads. Given that the only read access is from seq_file operations which are always invoked back-to-back, the rwsem is a giant overkill. All it does is adding unnecessary complexity. This patch removes cgroup_pidlist->rwsem and protects all accesses to pidlists belonging to a cgroup with cgroup->pidlist_mutex. pidlist->rwsem locking is removed if it's nested inside cgroup->pidlist_mutex; otherwise, it's replaced with cgroup->pidlist_mutex locking. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: refactor cgroup_pidlist_find()Tejun Heo
Rename cgroup_pidlist_find() to cgroup_pidlist_find_create() and separate out finding proper to cgroup_pidlist_find(). Also, move locking to the caller. This patch is preparation for pidlist restructure and doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: introduce struct cgroup_pidlist_open_fileTejun Heo
For pidlist files, seq_file->private pointed to the loaded cgroup_pidlist; however, pidlist loading is planned to be moved to cgroup_pidlist_start() for kernfs conversion and seq_file->private needs to carry more information from open to allow that. This patch introduces struct cgroup_pidlist_open_file which contains type, cgrp and pidlist and updates pidlist seq_file->private to point to it using seq_open_private() and seq_release_private(). Note that this eventually will be replaced by kernfs_open_file. While this patch makes more information available to seq_file operations, they don't use it yet and this patch doesn't introduce any behavior changes except for allocation of the extra private struct. v2: use __seq_open_private() instead of seq_open_private() for brevity as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: implement delayed destruction for cgroup_pidlistTejun Heo
Currently, pidlists are reference counted from file open and release methods. This means that holding onto an open file may waste memory and reads may return data which is very stale. Both aren't critical because pidlists are keyed and shared per namespace and, well, the user isn't supposed to have large delay between open and reads. cgroup is planned to be converted to use kernfs and it'd be best if we can stick to just the seq_file operations - start, next, stop and show. This can be achieved by loading pidlist on demand from start and release with time delay from stop, so that consecutive reads don't end up reloading the pidlist on each iteration. This would remove the need for hooking into open and release while also avoiding issues with holding onto pidlist for too long. This patch implements delayed release of pidlist. As pidlists could be lingering on cgroup removal waiting for the timer to expire, cgroup free path needs to queue the destruction work item immediately and flush. As those work items are self-destroying, each work item can't be flushed directly. A new workqueue - cgroup_pidlist_destroy_wq - is added to serve as flush domain. Note that this patch just adds delayed release on top of the current implementation and doesn't change where pidlist is loaded and released. Following patches will make those changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: remove cftype->release()Tejun Heo
Now that pidlist files don't use cftype->release(), it doesn't have any user left. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29cgroup: don't skip seq_open on write only opens on pidlist filesTejun Heo
Currently, cgroup_pidlist_open() skips seq_open() and pidlist loading if the file is opened write-only, which is a sensible optimization as pidlist loading can be costly and there often are occasions where tasks or cgroup.procs is opened write-only. However, pidlist init and release are planned to be moved to cgroup_pidlist_start/stop() respectively which would make this optimization unnecessary. This patch removes the optimization and always fully initializes pidlist files regardless of open mode. This will help moving pidlist handling to start/stop by unifying rw paths and removes the need for specifying cftype->release() in addition to .release in cgroup_pidlist_operations as file->f_op is now always overridden. As pidlist files were the only user of cftype->release(), the next patch will remove the method. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=offThomas Gleixner
If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ. If CONFIG_NO_HZ=y and the nohz functionality is disabled via the command line option "nohz=off" or not enabled due to missing hardware support, then tick_nohz_get_sleep_length() returns 0. That happens because ts->sleep_length is never set in that case. Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive. Reported-by: Michal Hocko <mhocko@suse.cz> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-11-28kernel/extable: fix address-checks for core_kernel and init areasHelge Deller
The init_kernel_text() and core_kernel_text() functions should not include the labels _einittext and _etext when checking if an address is inside the .text or .init sections. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-27cgroup: Merge branch 'for-3.13-fixes' into for-3.14Tejun Heo
Pull to receive e605b36575e8 ("cgroup: fix cgroup_subsys_state leak for seq_files") as for-3.14 is scheduled to have a lot of changes which depend on it. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-27cgroup: fix cgroup_subsys_state leak for seq_filesTejun Heo
If a cgroup file implements either read_map() or read_seq_string(), such file is served using seq_file by overriding file->f_op to cgroup_seqfile_operations, which also overrides the release method to single_release() from cgroup_file_release(). Because cgroup_file_open() didn't use to acquire any resources, this used to be fine, but since f7d58818ba42 ("cgroup: pin cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open() pins the css (cgroup_subsys_state) which is put by cgroup_file_release(). The patch forgot to update the release path for seq_files and each open/release cycle leaks a css reference. Fix it by updating cgroup_file_release() to also handle seq_files and using it for seq_file release path too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.12
2013-11-27cpuset: Fix memory allocator deadlockPeter Zijlstra
Juri hit the below lockdep report: [ 4.303391] ====================================================== [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted [ 4.303395] ------------------------------------------------------ [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290 [ 4.303417] [ 4.303417] and this task is already holding: [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100 [ 4.303431] which would create a new lock dependency: [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...} [ 4.303436] [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 { [ 4.303922] HARDIRQ-ON-W at: [ 4.303923] [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0 [ 4.303926] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303929] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303931] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303933] SOFTIRQ-ON-W at: [ 4.303933] [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0 [ 4.303935] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303940] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303955] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303959] INITIAL USE at: [ 4.303960] [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0 [ 4.303963] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303966] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303969] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303972] } Which reports that we take mems_allowed_seq with interrupts enabled. A little digging found that this can only be from cpuset_change_task_nodemask(). This is an actual deadlock because an interrupt doing an allocation will hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin forever waiting for the write side to complete. Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Reported-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Juri Lelli <juri.lelli@gmail.com> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2013-11-27sched: Add sched_class->task_dead() methodDario Faggioli
Add a new function to the scheduling class interface. It is called at the end of a context switch, if the prev task is in TASK_DEAD state. It will be useful for the scheduling classes that want to be notified when one of their tasks dies, e.g. to perform some cleanup actions, such as SCHED_DEADLINE. Signed-off-by: Dario Faggioli <raistlin@linux.it> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Cc: bruce.ashfield@windriver.com Cc: claudio@evidence.eu.com Cc: darren@dvhart.com Cc: dhaval.giani@gmail.com Cc: fchecconi@gmail.com Cc: fweisbec@gmail.com Cc: harald.gustafsson@ericsson.com Cc: hgu1972@gmail.com Cc: insop.song@gmail.com Cc: jkacur@redhat.com Cc: johan.eker@ericsson.com Cc: liming.wang@windriver.com Cc: luca.abeni@unitn.it Cc: michael@amarulasolutions.com Cc: nicola.manica@disi.unitn.it Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: p.faure@akatech.ch Cc: rostedt@goodmis.org Cc: tommaso.cucinotta@sssup.it Cc: vincent.guittot@linaro.org Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-2-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27sched/fair: Clean up update_sg_lb_stats() a bitKamalesh Babulal
Add rq->nr_running to sgs->sum_nr_running directly instead of assigning it through an intermediate variable nr_running. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1384508212-25032-1-git-send-email-kamalesh@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27sched/fair: Move load idx selection in find_idlest_groupVincent Guittot
load_idx is used in find_idlest_group but initialized in select_task_rq_fair even when not used. The load_idx initialisation is moved in find_idlest_group and the sd_flag replaces it in the function's args. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: len.brown@intel.com Cc: amit.kucheria@linaro.org Cc: pjt@google.com Cc: l.majewski@samsung.com Cc: Morten.Rasmussen@arm.com Cc: cmetcalf@tilera.com Cc: tony.luck@intel.com Cc: alex.shi@intel.com Cc: preeti@linux.vnet.ibm.com Cc: linaro-kernel@lists.linaro.org Cc: rjw@sisk.pl Cc: paulmck@linux.vnet.ibm.com Cc: corbet@lwn.net Cc: arjan@linux.intel.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1382097147-30088-8-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27sched: Check TASK_DEAD rather than EXIT_DEAD in schedule_debug()Oleg Nesterov
schedule_debug() ignores in_atomic() if prev->exit_state != 0. This is not what we want, ->exit_state is set by exit_notify() but we should complain until the task does the last schedule() in TASK_DEAD. See also 7407251a0e2e "PF_DEAD cleanup", I think this ancient commit explains why schedule() had to rely on ->exit_state, until that commit exit_notify() disabled preemption and set PF_DEAD which was used to detect the exiting task. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131113154538.GB15810@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27tasks/fork: Remove unnecessary child->exit_stateOleg Nesterov
A zombie task obviously can't fork(), remove the unnecessary initialization of child->exit_state. It is zero anyway after dup_task_struct(). Note: copy_process() is huge and it has a lot of chaotic initializations, probably it makes sense to move them into the new helper called by dup_task_struct(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131113143612.GA10540@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27lockdep: Be nice about building from userspaceSasha Levin
Lockdep is an awesome piece of code which detects locking issues which are relevant both to userspace and kernelspace. We can easily make lockdep work in userspace since there is really no kernel spacific magic going on in the code. All we need is to wrap two functions which are used by lockdep and are very kernel specific. Doing that will allow tools located in tools/ to easily utilize lockdep's code for their own use. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: penberg@kernel.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/1352753446-24109-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27perf: Add active_entry list head to struct perf_eventStephane Eranian
This patch adds a new field to the struct perf_event. It is intended to be used to chain events which are active (enabled). It helps in the hardware layer for PMUs which do not have actual counter restrictions, i.e., free running read-only counters. Active events are chained as opposed to being tracked via the counter they use. To save space we use a union with hlist_entry as both are mutually exclusive (suggested by Jiri Olsa). Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1384275531-10892-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27lockdep: Simplify a bit hardirq <-> softirq transitionsFrederic Weisbecker
Instead of saving the hardirq state on a per CPU variable, which require an explicit call before the softirq handling and some complication, just save and restore the hardirq tracing state through functions return values and parameters. It simplifies a bit the black magic that works around the fact that softirqs can be called from hardirqs while hardirqs can nest on softirqs but those two cases have very different semantics and only the latter case assume both states. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1384906054-30676-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27Merge branch 'core/urgent' into core/lockingIngo Molnar
Prepare for dependent patch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27sched: Expose preempt_schedule_irq()Thomas Gleixner
Tony reported that aa0d53260596 ("ia64: Use preempt_schedule_irq") broke PREEMPT=n builds on ia64. Ok, wrapped my brain around it. I tripped over the magic asm foo which has a single need_resched check and schedule point for both sys call return and interrupt return. So you need the schedule_preempt_irq() for kernel preemption from interrupt return while on a normal syscall preemption a schedule would be sufficient. But using schedule_preempt_irq() is not harmful here in any way. It just sets the preempt_active bit also in cases where it would not be required. Even on preempt=n kernels adding the preempt_active bit is completely harmless. So instead of having an extra function, moving the existing one out of the ifdef PREEMPT looks like the sanest thing to do. It would also allow getting rid of various other sti/schedule/cli asm magic in other archs. Reported-and-Tested-by: Tony Luck <tony.luck@gmail.com> Fixes: aa0d53260596 ("ia64: Use preempt_schedule_irq") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [slightly edited Changelog] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311211230030.30673@ionos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-26fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)Eric W. Biederman
Serge Hallyn <serge.hallyn@ubuntu.com> writes: > Hi Oleg, > > commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > breaks lxc-attach in 3.12. That code forks a child which does > setns() and then does a clone(CLONE_PARENT). That way the > grandchild can be in the right namespaces (which the child was > not) and be a child of the original task, which is the monitor. > > lxc-attach in 3.11 was working fine with no side effects that I > could see. Is there a real danger in allowing CLONE_PARENT > when current->nsproxy->pidns_for_children is not our pidns, > or was this done out of an "over-abundance of caution"? Can we > safely revert that new extra check? The two fundamental things I know we can not allow are: - A shared signal queue aka CLONE_THREAD. Because we compute the pid and uid of the signal when we place it in the queue. - Changing the pid and by extention pid_namespace of an existing process. From a parents perspective there is nothing special about the pid namespace, to deny CLONE_PARENT, because the parent simply won't know or care. From the childs perspective all that is special really are shared signal queues. User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks in different pid namespaces is almost certainly going to break because it is complicated. But shared signal handlers can look at per thread information to know which pid namespace a process is in, so I don't know of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads at the kernel level. It would be absolutely stupid to implement but that is a different thing. So hmm. Because it can do no harm, and because it is a regression let's remove the CLONE_PARENT check and send it stable. Cc: stable@vger.kernel.org Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-11-26Merge tag 'trace-fixes-v3.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes two fixes. 1) is a bug fix that happens when root does the following: echo function_graph > current_tracer modprobe foo echo nop > current_tracer This causes the ftrace internal accounting to get screwed up and crashes ftrace, preventing the user from using the function tracer after that. 2) if a TRACE_EVENT has a string field, and NULL is given for it. The internal trace event code does a strlen() and strcpy() on the source of field. If it is NULL it causes the system to oops. This bug has been there since 2.6.31, but no TRACE_EVENT ever passed in a NULL to the string field, until now" * tag 'trace-fixes-v3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix function graph with loading of modules tracing: Allow events to have NULL strings
2013-11-26sched: Fix a trivial syntax misuseShigeru Yoshida
Use if statement instead of while loop. Signed-off-by: Shigeru Yoshida <shigeru.yoshida@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/20131123.183801.769652906919404319.shigeru.yoshida@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>