summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2014-03-19cgroup: use cgroup_setup_root() to initialize cgroup_dummy_rootTejun Heo
cgroup_dummy_root is used to host controllers which aren't attached to any other hierarchy. The root is minimally set up during kernfs bootstrap and didn't go through full hierarchy initialization. We're planning to use cgroup_dummy_root for the default unified hierarchy and thus want it to be fully functional. Replace the special initialization, which was collected into cgroup_init() by the previous patch, with an invocation of cgroup_setup_root(). This simplifies the init path and makes cgroup_dummy_root a full hierarchy with its own kernfs_root and all. As this puts the dummy hierarchy on the cgroup_roots list, rename for_each_active_root() to for_each_root() and update its users to skip the dummy root for now. This patch doesn't cause any userland visible behavior changes at this point. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: reorganize cgroup bootstrappingTejun Heo
* Fields of init_css_set and css_set_count are now set using initializer instead of programmatically from cgroup_init_early(). * init_cgroup_root() now also takes @opts and performs the optional part of initialization too. The leftover part of cgroup_root_from_opts() is collapsed into its only caller - cgroup_mount(). * Initialization of cgroup_root_count and linking of init_css_set are moved from cgroup_init_early() to to cgroup_init(). None of the early_init users depends on init_css_set being linked. * Subsystem initializations are moved after dummy hierarchy init and init_css_set linking. These changes reorganize the bootstrap logic so that the dummy hierarchy can share the usual hierarchy init path and be made more normal. These changes don't make noticeable behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19cgroup: relocate setting of CGRP_DEADTejun Heo
In cgroup_destroy_locked(), move setting of CGRP_DEAD above invocations of kill_css(). This doesn't make any visible behavior difference now but will be used to inhibit manipulating controller enable states of a dying cgroup on the unified hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2014-03-19genirq: procfs: Make smp_affinity values go+rChema Gonzalez
Includes: - /proc/irq/default_smp_affinity - /proc/irq/*/affinity_hint - /proc/irq/*/smp_affinity - /proc/irq/*/smp_affinity_list Users can distill the same information by reading /proc/interrupts. Signed-off-by: Chema Gonzalez <chema@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: http://lkml.kernel.org/r/1394765455-1217-1-git-send-email-chema@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-19softirq: Add linux/irq.h to make it compile againThomas Gleixner
On Sparc and S390 the removal of irq.h from kernel_stat.h causes: kernel/softirq.c:774:9: error: 'NR_IRQS_LEGACY' undeclared Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-18cgroup: fix a failure path in create_css()Li Zefan
If online_css() fails, we should remove cgroup files belonging to css->ss. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-18uprobes: allow ignoring of probe hitsDavid A. Long
Allow arches to decided to ignore a probe hit. ARM will use this to only call handlers if the conditions to execute a conditionally executed instruction are satisfied. Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-18uprobes: Kconfig dependency fixDavid A. Long
Suggested change from Oleg Nesterov. Fixes incomplete dependencies for uprobes feature. Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-03-16Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three small fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Prevent tracing recursion in sched_clock_cpu() stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus() sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
2014-03-14genirq: Add a new IRQCHIP_EOI_THREADED flagThomas Gleixner
The flag is necessary for interrupt chips which require an ACK/EOI after the handler has run. In case of threaded handlers this needs to happen after the threaded handler has completed before the unmask of the interrupt. The flag is only unseful in combination with the handle_fasteoi_irq flow control handler. It can be combined with the flag IRQCHIP_EOI_IF_HANDLED, so the EOI is not issued when the interrupt is disabled or in progress. Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-sunxi@googlegroups.com Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Link: http://lkml.kernel.org/r/1394733834-26839-2-git-send-email-hdegoede@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-13block: remove old blk_iopoll_enabled variableJens Axboe
This was a debugging measure to toggle enabled/disabled when testing. But for real production setups, it's not safe to toggle this setting without either reloading drivers of quiescing IO first. Neither of which the toggle enforces. Additionally, it makes drivers deal with the conditional state. Remove it completely. It's up to the driver whether iopoll is enabled or not. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-13sched: Remove needless round trip nsecs <-> tick conversion of steal timeFrederic Weisbecker
When update_rq_clock_task() accounts the pending steal time for a task, it converts the steal delta from nsecs to tick then from tick to nsecs. There is no apparent good reason for doing that though because both the task clock and the prev steal delta are u64 and store values in nsecs. So lets remove the needless conversion. Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13cputime: Fix jiffies based cputime assumption on steal accountingFrederic Weisbecker
The steal guest time accounting code assumes that cputime_t is based on jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t is based on nsecs, steal_account_process_tick() passes the delta in jiffies to account_steal_time() which then accounts it as if it's a value in nsecs. As a result, accounting 1 second of steal time (with HZ=100 that would be 100 jiffies) is spuriously accounted as 100 nsecs. As such /proc/stat may report 0 values of steal time even when two guests have run concurrently for a few seconds on the same host and same CPU. In order to fix this, lets convert the nsecs based steal delta to cputime instead of jiffies by using the right conversion API. Given that the steal time is stored in cputime_t and this type can have a smaller granularity than nsecs, we only account the rounded converted value and leave the remaining nsecs for the next deltas. Reported-by: Huiqingding <huding@redhat.com> Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULEMathieu Desnoyers
Users have reported being unable to trace non-signed modules loaded within a kernel supporting module signature. This is caused by tracepoint.c:tracepoint_module_coming() refusing to take into account tracepoints sitting within force-loaded modules (TAINT_FORCED_MODULE). The reason for this check, in the first place, is that a force-loaded module may have a struct module incompatible with the layout expected by the kernel, and can thus cause a kernel crash upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y. Tracepoints, however, specifically accept TAINT_OOT_MODULE and TAINT_CRAP, since those modules do not lead to the "very likely system crash" issue cited above for force-loaded modules. With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed module is tainted re-using the TAINT_FORCED_MODULE taint flag. Unfortunately, this means that Tracepoints treat that module as a force-loaded module, and thus silently refuse to consider any tracepoint within this module. Since an unsigned module does not fit within the "very likely system crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag to specifically address this taint behavior, and accept those modules within Tracepoints. We use the letter 'X' as a taint flag character for a module being loaded that doesn't know how to sign its name (proposed by Steven Rostedt). Also add the missing 'O' entry to trace event show_module_flags() list for the sake of completeness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> NAKed-by: Ingo Molnar <mingo@redhat.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: David Howells <dhowells@redhat.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13module: use pr_contJiri Slaby
When dumping loaded modules, we print them one by one in separate printks. Let's use pr_cont as they are continuation prints. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-12Merge branch 'irq/for-gpio' into irq/coreThomas Gleixner
Merge the request/release callbacks which are in a separate branch for consumption by the gpio folks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12genirq: Provide irq_request/release_resources chip callbacksThomas Gleixner
For certain irq types, e.g. gpios, it's necessary to request resources before starting up the irq. This might fail so we cannot use the irq_startup() callback because we might call the irq_set_type() callback before that which does not make sense when the resource is not available. Calling irq_startup() before irq_set_type() can lead to spurious interrupts which is not desired either. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jean-Jacques Hiblot <jjhiblot@traphandler.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1403080857160.18573@ionos.tec.linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12locking/mutex: Fix debug checksPeter Zijlstra
OK, so commit: 1d8fe7dc8078 ("locking/mutexes: Unlock the mutex without the wait_lock") generates this boot warning when CONFIG_DEBUG_MUTEXES=y: WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current) And that makes sense, because as soon as we release the lock a new owner can come in... One would think that !__mutex_slowpath_needs_to_unlock() implementations suffer the same, but for DEBUG we fall back to mutex-null.h which has an unconditional 1 for that. The mutex debug code requires the mutex to be unlocked after doing the debug checks, otherwise it can find inconsistent state. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: jason.low2@hp.com Link: http://lkml.kernel.org/r/20140312122442.GB27965@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12sched: Clean up the task_hot() functionAlex Shi
task_hot() doesn't need the 'sched_domain' parameter, so remove it. Signed-off-by: Alex Shi <alex.shi@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex.shi@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12sched: Remove double calculation in fix_small_imbalance()Vincent Guittot
The tmp value has been already calculated in: scaled_busy_load_per_task = (busiest->load_per_task * SCHED_POWER_SCALE) / busiest->group_power; Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394555166-22894-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12sched: Fix broken setscheduler()Steven Rostedt
I decided to run my tests on linux-next, and my wakeup_rt tracer was broken. After running a bisect, I found that the problem commit was: linux-next commit c365c292d059 "sched: Consider pi boosting in setscheduler()" And the reason the wake_rt tracer test was failing, was because it had no RT task to trace. I first noticed this when running with sched_switch event and saw that my RT task still had normal SCHED_OTHER priority. Looking at the problem commit, I found: - p->normal_prio = normal_prio(p); - p->prio = rt_mutex_getprio(p); With no + p->normal_prio = normal_prio(p); + p->prio = rt_mutex_getprio(p); Reading what the commit is suppose to do, I realize that the p->prio can't be set if the task is boosted with a higher prio, but the p->normal_prio still needs to be set regardless, otherwise, when the task is deboosted, it wont get the new priority. The p->prio has to be set before "check_class_changed()" is called, otherwise the class wont be changed. Also added fix to newprio to include a check for deadline policy that was missing. This change was suggested by Juri Lelli. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: SebastianAndrzej Siewior <bigeasy@linutronix.de> Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140306120438.638bfe94@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11ftrace: Constify ftrace_text_reservedSasha Levin
Link: http://lkml.kernel.org/r/1357772960-4436-5-git-send-email-sasha.levin@oracle.com Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-11tracepoints: API doc update to tracepoint_probe_register() return valueMathieu Desnoyers
Describe the return values of tracepoint_probe_register(), including -ENODEV added by commit: Author: Steven Rostedt <rostedt@goodmis.org> tracing: Warn if a tracepoint is not set via debugfs Link: http://lkml.kernel.org/r/1394499898-1537-2-git-send-email-mathieu.desnoyers@efficios.com CC: Ingo Molnar <mingo@kernel.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-11tracepoints: API doc update to data argumentMathieu Desnoyers
Describe the @data argument (probe private data). Link: http://lkml.kernel.org/r/1394587948-27878-1-git-send-email-mathieu.desnoyers@efficios.com Fixes: 38516ab59fbc "tracing: Let tracepoints have data passed to tracepoint callbacks" CC: Ingo Molnar <mingo@kernel.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-12PM: Add missing "freeze" stateGeert Uytterhoeven
Fix descriptions of /sys/power/state in the documentation and in a code comment. Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Pavel Machek <pavel@ucw.cz> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12PM / Hibernate: Spelling s/anonymouns/anonymous/Geert Uytterhoeven
Spelling fix. Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-11ftrace: Fix compilation warning about control_ops_freeJiri Slaby
With CONFIG_DYNAMIC_FTRACE=n, I see a warning: kernel/trace/ftrace.c:240:13: warning: 'control_ops_free' defined but not used static void control_ops_free(struct ftrace_ops *ops) ^ Move that function around to an already existing #ifdef CONFIG_DYNAMIC_FTRACE block as the function is used solely from the dynamic function tracing functions. Link: http://lkml.kernel.org/r/1394484131-5107-1-git-send-email-jslaby@suse.cz Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull audit namespace fixes from Eric Biederman: "Starting with 3.14-rc1 the audit code is faulty (think oopses and races) with respect to how it computes the network namespace of which socket to reply to, and I happened to notice by chance when reading through the code. My testing and the automated build bots don't find any problems with these fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: audit: Update kdoc for audit_send_reply and audit_list_rules_send audit: Send replies in the proper network namespace. audit: Use struct net not pid_t to remember the network namespce to reply in
2014-03-11locking/mutexes: Add extra reschedule pointPeter Zijlstra
Add in an extra reschedule in an attempt to avoid getting reschedule the moment we've acquired the lock. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-zah5eyn9gu7qlgwh9r6n2anc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11locking/mutexes: Introduce cancelable MCS lock for adaptive spinningPeter Zijlstra
Since we want a task waiting for a mutex_lock() to go to sleep and reschedule on need_resched() we must be able to abort the mcs_spin_lock() around the adaptive spin. Therefore implement a cancelable mcs lock. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: chegu_vinod@hp.com Cc: paulmck@linux.vnet.ibm.com Cc: Waiman.Long@hp.com Cc: torvalds@linux-foundation.org Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Cc: Jason Low <jason.low2@hp.com> Link: http://lkml.kernel.org/n/tip-62hcl5wxydmjzd182zhvk89m@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11locking/mutexes: Unlock the mutex without the wait_lockJason Low
When running workloads that have high contention in mutexes on an 8 socket machine, mutex spinners would often spin for a long time with no lock owner. The main reason why this is occuring is in __mutex_unlock_common_slowpath(), if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the mutex->wait_lock before releasing the mutex (setting lock->count to 1). When the wait_lock is contended, this delays the mutex from being released. We should be able to release the mutex without holding the wait_lock. Signed-off-by: Jason Low <jason.low2@hp.com> Cc: chegu_vinod@hp.com Cc: paulmck@linux.vnet.ibm.com Cc: Waiman.Long@hp.com Cc: torvalds@linux-foundation.org Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390936396-3962-4-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11locking/mutexes: Modify the way optimistic spinners are queuedJason Low
The mutex->spin_mlock was introduced in order to ensure that only 1 thread spins for lock acquisition at a time to reduce cache line contention. When lock->owner is NULL and the lock->count is still not 1, the spinner(s) will continually release and obtain the lock->spin_mlock. This can generate quite a bit of overhead/contention, and also might just delay the spinner from getting the lock. This patch modifies the way optimistic spinners are queued by queuing before entering the optimistic spinning loop as oppose to acquiring before every call to mutex_spin_on_owner(). So in situations where the spinner requires a few extra spins before obtaining the lock, then there will only be 1 spinner trying to get the lock and it will avoid the overhead from unnecessarily unlocking and locking the spin_mlock. Signed-off-by: Jason Low <jason.low2@hp.com> Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Cc: chegu_vinod@hp.com Cc: Waiman.Long@hp.com Cc: paulmck@linux.vnet.ibm.com Cc: torvalds@linux-foundation.org Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390936396-3962-3-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11locking/mutexes: Return false if task need_resched() in ↵Jason Low
mutex_can_spin_on_owner() The mutex_can_spin_on_owner() function should also return false if the task needs to be rescheduled to avoid entering the MCS queue when it needs to reschedule. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Waiman.Long@hp.com Cc: torvalds@linux-foundation.org Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Cc: chegu_vinod@hp.com Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1390936396-3962-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11locking: Move mcs_spinlock.h into kernel/locking/Peter Zijlstra
The mcs_spinlock code is not meant (or suitable) as a generic locking primitive, therefore take it away from the normal includes and place it in kernel/locking/. This way the locking primitives implemented there can use it as part of their implementation but we do not risk it getting used inapropriately. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-byirmpamgr7h25m5kyavwpzx@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/numa: Move task_numa_free() to __put_task_struct()Mike Galbraith
Bad idea on -rt: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf and since upstream does not mind where we do this, be a bit nicer ... Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/fair: Fix endless loop in idle_balance()Kirill Tkhai
Check for fair tasks number to decide, that we've pulled a task. rq's nr_running may contain throttled RT tasks. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/core: Fix endless loop in pick_next_task()Kirill Tkhai
1) Single cpu machine case. When rq has only RT tasks, but no one of them can be picked because of throttling, we enter in endless loop. pick_next_task_{dl,rt} return NULL. In pick_next_task_fair() we permanently go to retry if (rq->nr_running != rq->cfs.h_nr_running) return RETRY_TASK; (rq->nr_running is not being decremented when rt_rq becomes throttled). No chances to unthrottle any rt_rq or to wake fair here, because of rq is locked permanently and interrupts are disabled. 2) In case of SMP this can cause a hang too. Although we unlock rq in idle_balance(), interrupts are still disabled. The solution is to check for available tasks in DL and RT classes instead of checking for sum. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/fair: Push down check for high priority class task into idle_balance()Kirill Tkhai
We close idle_exit_fair() bracket in case of we've pulled something or we've received task of high priority class. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/1394098315.19290.10.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/rt: Fix picking RT and DL tasks from empty queueKirill Tkhai
The problems: 1) We check for rt_nr_running before call of put_prev_task(). If previous task is RT, its rt_rq may become throttled and dequeued after this call. In case of p is from rt->rq this just causes picking a task from throttled queue, but in case of its rt_rq is child we are guaranteed catch BUG_ON. 2) The same with deadline class. The only difference we operate on only dl_rq. This patch fixes all the above problems and it adds a small skip in the DL update like we've already done for RT class: if (unlikely((s64)delta_exec <= 0)) return; This will optimize sequential update_curr_dl() calls a little. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Link: http://lkml.kernel.org/r/1393946746.3643.3.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11perf: Disallow user-space stack dumps for function trace eventsJiri Olsa
Recent issues with user space callchains processing within page fault handler tracing showed as Peter said 'there's just too much fail surface'. The user space stack dump is just another source of the this issue. Related list discussions: http://marc.info/?t=139302086500001&r=1&w=2 http://marc.info/?t=139301437300003&r=1&w=2 Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1393775800-13524-3-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11perf: Disallow user-space callchains for function trace eventsJiri Olsa
Recent issues with user space callchains processing within page fault handler tracing showed as Peter said 'there's just too much fail surface'. Related list discussions: http://marc.info/?t=139302086500001&r=1&w=2 http://marc.info/?t=139301437300003&r=1&w=2 Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1393775800-13524-2-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/idle: Add more comments to the codeDaniel Lezcano
The idle main function is a complex and a critical function. Added more comments to the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: tglx@linutronix.de Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-5-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/idle: Move idle conditions in cpuidle_idle main functionDaniel Lezcano
This patch moves the condition before entering idle into the cpuidle main function located in idle.c. That simplify the idle mainloop functions and increase the readibility of the conditions to enter truly idle. This patch is code reorganization and does not change the behavior of the function. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: tglx@linutronix.de Cc: rjw@rjwysocki.net Cc: nicolas.pitre@linaro.org Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-4-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/idle: Reorganize the idle loopDaniel Lezcano
Now that we have the main cpuidle function in idle.c, move some code from the idle mainloop to this function for the sake of clarity. That removes if then else indentation difficult to follow when looking at the code. This patch does not change the current behavior. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: tglx@linutronix.de Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-3-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11cpuidle/idle: Move the cpuidle_idle_call function to idle.cDaniel Lezcano
The cpuidle_idle_call does nothing more than calling the three individuals function and is no longer used by any arch specific code but only in the cpuidle framework code. We can move this function into the idle task code to ensure better proximity to the scheduler code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11Merge branch 'sched/urgent' into sched/coreIngo Molnar
Pick up fixes before queueing up new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/clock: Prevent tracing recursion in sched_clock_cpu()Fernando Luis Vazquez Cao
Prevent tracing of preempt_disable/enable() in sched_clock_cpu(). When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are traced and this causes trace_clock() users (and probably others) to go into an infinite recursion. Systems with a stable sched_clock() are not affected. This problem is similar to that fixed by upstream commit 95ef1e52922 ("KVM guest: prevent tracing recursion with kvmclock"). Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1394083528.4524.3.camel@nexus Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()Peter Zijlstra
We must use smp_call_function_single(.wait=1) for the irq_cpu_stop_queue_work() to ensure the queueing is actually done under stop_cpus_lock. Without this we could have dropped the lock by the time we do the queueing and get the race we tried to fix. Fixes: 7053ea1a34fa ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policyJuri Lelli
Deny the use of SCHED_DEADLINE policy to unprivileged users. Even if root users can set the policy for normal users, we don't want the latter to be able to change their parameters (safest behavior). Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1393844961-18097-1-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>