Age | Commit message (Collapse) | Author |
|
There is no function named cgroup_enable_task_cg_links().
Instead, the correct function name in this comment should
be cgroup_enabled_task_cg_lists().
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
only task running on a cpu
This function will help an async task processing batched jobs from
workqueue decide if it wants to keep processing on more chunks of batched
work that can be delayed, or to accumulate more work for more efficient
batched processing later.
If no other tasks are running on the cpu, the batching process can take
advantgae of the available cpu cycles to a make decision to continue
processing the existing accumulated work to minimize delay,
otherwise it will yield.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A kprobes and a perf compat ioctl fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Handle compat ioctl
kprobes: Skip kretprobe hit in NMI context to avoid deadlock
|
|
Conflicts:
arch/x86/kernel/cpu/perf_event_intel_uncore*.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Adding new perf event state to indicate that the monitored task has
exited. In this case the event stays alive until the owner task exits
or close the event fd while providing the last data through the read
syscall and ring buffer.
Instead it needs to propagate the error info (monitored task has died)
via poll and read syscalls by returning POLLHUP and 0 respectively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140811120102.GY9918@twins.programming.kicks-ass.net
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-t5y3w8jjx6tfo5w8y6oajsjq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently perf_poll returns POLL_HUP in case of error, which is wrong,
because poll syscall expects POLLHUP. The POLL_HUP is meant to be used
for SIGIO state.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140811120102.GY9918@twins.programming.kicks-ass.net
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-0ywfthh4lh65swe15f6w2x2q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In __ftrace_replace_code(), when converting the call to a nop in a function
it needs to compare against the "curr" (current) value of the ftrace ops, and
not the "new" one. It currently does not affect x86 which is the only arch
to do the trampolines with function graph tracer, but when other archs that do
depend on this code implement the function graph trampoline, it can crash.
Here's an example when ARM uses the trampolines (in the future):
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
Modules linked in: omap_rng rng_core ipv6
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
[<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
[<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
[<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
[<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
[<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
[<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
[<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
[<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
[<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
[<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
[<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
[<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
[<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
---[ end trace dc9ce72c5b617d8f ]---
[ 65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
[ 65.054070] actual: 85:1b:00:eb
Fixes: 7413af1fb70e7 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The latest rewrite of ftrace removed the separate ftrace_ops of
the function tracer and the function graph tracer and had them
share the same ftrace_ops. This simplified the accounting by removing
the multiple layers of functions called, where the global_ops func
would call a special list that would iterate over the other ops that
were registered within it (like function and function graph), which
itself was registered to the ftrace ops list of all functions
currently active. If that sounds confusing, the code that implemented
it was also confusing and its removal is a good thing.
The problem with this change was that it assumed that the function
and function graph tracer can never be used at the same time.
This is mostly true, but there is an exception. That is when the
function profiler uses the function graph tracer to profile.
The function profiler can be activated the same time as the function
tracer, and this breaks the assumption and the result is that ftrace
will crash (it detects the error and shuts itself down, it does not
cause a kernel oops).
To solve this issue, a previous change allowed the hash tables
for the functions traced by a ftrace_ops to be a pointer and let
multiple ftrace_ops share the same hash. This allows the function
and function_graph tracer to have separate ftrace_ops, but still
share the hash, which is what is done.
Now the function and function graph tracers have separate ftrace_ops
again, and the function tracer can be run while the function_profile
is active.
Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
the rec->flags by more than once (one per those that share the ftrace_hash).
This means that the tramp_hash may not have a hash item when it was added.
For example, if two ftrace_ops share a hash for a ftrace record, and the
first ops has a trampoline, when it adds itself it will set the rec->flags
TRAMP flag and increments its nr_trampolines counter. When the second ops
is added, it must clear that tramp flag but also decrement the other ops
that shares its hash. As the update to the function callbacks has not yet
been performed, the other ops will not have the tramp hash set yet and it
can not be used to know to decrement its nr_trampolines.
Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
held, a ops with a trampoline to a record during an update of another ops
that shares the record will have its func_hash pointing to it. Since a
trampoline can only be set for a record if only one ops is attached to it,
we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
is set) and then find the ops that has this record in its hashes.
Also added some output to help debug when things go wrong.
Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When updating what an ftrace_ops traces, if it is registered (that is,
actively tracing), and that ftrace_ops uses the shared global_ops
local_hash, then we need to update all tracers that are active and
also share the global_ops' ftrace_hash_ops.
Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Kernel command line parameter cgroup__DEVEL__legacy_files_on_dfl forces
legacy cgroup files to show up on default hierarhcy if susbsystem does
not have any files defined for default hierarchy.
But this seems to be working only if legacy files are defined in
ss->legacy_cftypes. If one adds some cftypes later using
cgroup_add_legacy_cftypes(), these files don't show up on default
hierarchy. Update the function accordingly so that the dynamically
added legacy files also show up in the default hierarchy if the target
subsystem is also using the base legacy files for the default
hierarchy.
tj: Patch description and comment updates.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Currently the top level debug file system function tracer shares its
ftrace_ops with the function graph tracer. This was thought to be fine
because the tracers are not used together, as one can only enable
function or function_graph tracer in the current_tracer file.
But that assumption proved to be incorrect. The function profiler
can use the function graph tracer when function tracing is enabled.
Since all function graph users uses the function tracing ftrace_ops
this causes a conflict and when a user enables both function profiling
as well as the function tracer it will crash ftrace and disable it.
The quick solution so far is to move them as separate ftrace_ops like
it was earlier. The problem though is to synchronize the functions that
are traced because both function and function_graph tracer are limited
by the selections made in the set_ftrace_filter and set_ftrace_notrace
files.
To handle this, a new structure is made called ftrace_ops_hash. This
structure will now hold the filter_hash and notrace_hash, and the
ftrace_ops will point to this structure. That will allow two ftrace_ops
to share the same hashes.
Since most ftrace_ops do not share the hashes, and to keep allocation
simple, the ftrace_ops structure will include both a pointer to the
ftrace_ops_hash called func_hash, as well as the structure itself,
called local_hash. When the ops are registered, the func_hash pointer
will be initialized to point to the local_hash within the ftrace_ops
structure. Some of the ftrace internal ftrace_ops will be initialized
statically. This will allow for the function and function_graph tracer
to have separate ops but still share the same hash tables that determine
what functions they trace.
Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In highres mode, the tick reschedules itself unconditionally to the
next jiffies.
However while this clock reprogramming is relevant when the tick is
in periodic mode, it's not that interesting when we run in dynticks mode
because irq exit is likely going to overwrite the next tick to some
randomly deferred future.
So lets just get rid of this tick self rescheduling in dynticks mode.
This way we can avoid some clockevents double write in favourable
scenarios like when we stop the tick completely in idle while no other
hrtimer is pending.
Suggested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
When we reach the end of the tick handler, we unconditionally reschedule
the next tick to the next jiffy. Then on irq exit, the nohz code
overrides that setting if needed and defers the next tick as far away in
the future as possible.
Now in the best dynticks case, when we actually don't need any tick in
the future (ie: expires == KTIME_MAX), low-res and high-res behave
differently. What we want in this case is to cancel the next tick
programmed by the previous one. That's what we do in high-res mode. OTOH
we lack a low-res mode equivalent of hrtimer_cancel() so we simply don't
do anything in this case and the next tick remains scheduled to jiffies + 1.
As a result, in low-res mode, when the dynticks code determines that no
tick is needed in the future, we can recursively get a spurious tick
every jiffy because then the next tick is always reprogrammed from the
tick handler and is never cancelled. And this can happen indefinetly
until some subsystem actually needs a precise tick in the future and only
then we eventually overwrite the previous tick handler setting to defer
the next tick.
We are fixing this by introducing the ONESHOT_STOPPED mode which will
let us pause a clockevent when no further interrupt is needed. Meanwhile
we can't expect all drivers to support this new mode.
So lets reduce much of the symptoms by skipping the nohz-blind tick
rescheduling from the tick-handler when the CPU is in dynticks mode.
That tick rescheduling wrongly assumed periodicity and the low-res
dynticks code can't cancel such decision. This breaks the recursive (and
thus the worst) part of the problem. In the worst case now, we'll get
only one extra tick due to uncancelled tick scheduled before we entered
dynticks mode.
This also removes a needless clockevent write on idle ticks. Since those
clock write are usually considered to be slow, it's a general win.
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
|
Avoid double_rq_lock() and use TASK_ON_RQ_MIGRATING for
load_balance(). The advantage is (obviously) not holding two
rq->lock's at the same time and thereby increasing parallelism.
Further note that if there was no task to migrate we will not
have acquired the second rq->lock at all.
The important point to note is that because we acquire dst->lock
immediately after releasing src->lock the potential wait time of
task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer
than it would have been in the double rq lock scenario.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528109.23412.94.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Avoid double_rq_lock() and use the TASK_ON_RQ_MIGRATING state for
active_load_balance_cpu_stop(). The advantage is (obviously) not
holding two 'rq->lock's at the same time and thereby increasing
parallelism.
Further note that if there was no task to migrate we will not
have acquired the second rq->lock at all.
The important point to note is that because we acquire dst->lock
immediately after releasing src->lock the potential wait time of
task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer
than it would have been in the double rq lock scenario.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528081.23412.92.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Avoid double_rq_lock() and use TASK_ON_RQ_MIGRATING for
__migrate_task(). The advantage is (obviously) not holding two
rq->lock's at the same time and thereby increasing parallelism.
The important point to note is that because we acquire dst->lock
immediately after releasing src->lock the potential wait time of
task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer
than it would have been in the double rq lock scenario.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528070.23412.89.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is a new p->on_rq state which will be used to indicate that a task
is in a process of migrating between two RQs. It allows to get
rid of double_rq_lock(), which we used to use to change a rq of
a queued task before.
Let's consider an example. To move a task between src_rq and
dst_rq we will do the following:
raw_spin_lock(&src_rq->lock);
/* p is a task which is queued on src_rq */
p = ...;
dequeue_task(src_rq, p, 0);
p->on_rq = TASK_ON_RQ_MIGRATING;
set_task_cpu(p, dst_cpu);
raw_spin_unlock(&src_rq->lock);
/*
* Both RQs are unlocked here.
* Task p is dequeued from src_rq
* but its on_rq value is not zero.
*/
raw_spin_lock(&dst_rq->lock);
p->on_rq = TASK_ON_RQ_QUEUED;
enqueue_task(dst_rq, p, 0);
raw_spin_unlock(&dst_rq->lock);
While p->on_rq is TASK_ON_RQ_MIGRATING, task is considered as
"migrating", and other parallel scheduler actions with it are
not available to parallel callers. The parallel caller is
spining till migration is completed.
The unavailable actions are changing of cpu affinity, changing
of priority etc, in other words all the functionality which used
to require task_rq(p)->lock before (and related to the task).
To implement TASK_ON_RQ_MIGRATING support we primarily are using
the following fact. Most of scheduler users (from which we are
protecting a migrating task) use task_rq_lock() and
__task_rq_lock() to get the lock of task_rq(p). These primitives
know that task's cpu may change, and they are spining while the
lock of the right RQ is not held. We add one more condition into
them, so they will be also spinning until the migration is
finished.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528062.23412.88.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Implement task_on_rq_queued() and use it everywhere instead of
on_rq check. No functional changes.
The only exception is we do not use the wrapper in
check_for_tasks(), because it requires to export
task_on_rq_queued() in global header files. Next patch in series
would return it back, so we do not twist it from here to there.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
(sched_entity::on_rq == 1) does not guarantee the task is pickable;
changes on throttled cfs_rq must not lead to reschedule.
Check for task_struct::on_rq instead.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407312361.8424.35.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Match the declaration of runqueues with the definition.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407950893-32731-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Change autogroup_move_group() to use for_each_thread() instead of
buggy while_each_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813192003.GA19334@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Change thread_group_cputime() to use for_each_thread() instead of
buggy while_each_thread(). This also makes the pid_alive() check
unnecessary.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Change kernel/sched/debug.c to use for_each_process_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813191956.GA19324@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Change kernel/sched/core.c to use for_each_process_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140813191953.GA19315@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.
For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.
This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.
Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
/proc/<pid>/cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc/<pid>/cgroup safely.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
|
|
This patch avoids printing the message 'enabled on all CPUs,
...' multiple times. For example, the issue can occur in the
following scenario:
1) watchdog_nmi_enable() fails to enable PMU counters and sets
cpu0_err.
2) 'echo [0|1] > /proc/sys/kernel/nmi_watchdog' is executed to
disable and re-enable the watchdog mechanism 'on the fly'.
3) If watchdog_nmi_enable() succeeds to enable PMU counters,
each CPU will print the message because step1 left behind a
non-zero cpu0_err.
if (!IS_ERR(event)) {
if (cpu == 0 || cpu0_err)
pr_info("enabled on all CPUs, ...")
The patch avoids this by clearing cpu0_err in watchdog_nmi_disable().
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1407768567-171794-4-git-send-email-dzickus@redhat.com
[ Applied small cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: pbonzini@redhat.com
Link: http://lkml.kernel.org/r/1407768567-171794-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Many embedded systems will not need these syscalls, and omitting them
saves space. Add a new EXPERT config option CONFIG_ADVISE_SYSCALLS
(default y) to support compiling them out.
bloat-o-meter:
add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-2250 (-2250)
function old new delta
sys_fadvise64 57 - -57
sys_fadvise64_64 691 - -691
sys_madvise 1502 - -1502
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
|
|
The commit
4982223e51e8 module: set nx before marking module MODULE_STATE_COMING.
introduced a regression: if a module fails to parse its arguments or
if mod_sysfs_setup fails, then the module's memory will be freed
while still read-only. Anything that reuses that memory will crash
as soon as it tries to write to it.
Cc: stable@vger.kernel.org # v3.16
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
"These are a couple of regression fixes, cpuidle menu governor
optimizations, fixes for ACPI proccessor and battery drivers,
hibernation fix to avoid problems related to the e820 memory map,
fixes for a few cpufreq drivers and a new version of the suspend
profiling tool analyze_suspend.py.
Specifics:
- Fix for an ACPI-based device hotplug regression introduced in 3.14
that causes a kernel panic to trigger when memory hot-remove is
attempted with CONFIG_ACPI_HOTPLUG_MEMORY unset from Tang Chen
- Fix for a cpufreq regression introduced in 3.16 that triggers a
"sleeping function called from invalid context" bug in
dev_pm_opp_init_cpufreq_table() from Stephen Boyd
- ACPI battery driver fix for a warning message added in 3.16 that
prints silly stuff sometimes from Mariusz Ceier
- Hibernation fix for safer handling of mismatches in the 820 memory
map between the configurations during image creation and during the
subsequent restore from Chun-Yi Lee
- ACPI processor driver fix to handle CPU hotplug notifications
correctly during system suspend/resume from Lan Tianyu
- Series of four cpuidle menu governor cleanups that also should
speed it up a bit from Mel Gorman
- Fixes for the speedstep-smi, integrator, cpu0 and arm_big_little
cpufreq drivers from Hans Wennborg, Himangi Saraogi, Markus
Pargmann and Uwe Kleine-König
- Version 3.0 of the analyze_suspend.py suspend profiling tool from
Todd E Brandt"
* tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / battery: Fix warning message in acpi_battery_get_state()
PM / tools: analyze_suspend.py: update to v3.0
cpufreq: arm_big_little: fix module license spec
cpufreq: speedstep-smi: fix decimal printf specifiers
ACPI / hotplug: Check scan handlers in acpi_scan_hot_remove()
cpufreq: OPP: Avoid sleeping while atomic
cpufreq: cpu0: Do not print error message when deferring
cpufreq: integrator: Use set_cpus_allowed_ptr
PM / hibernate: avoid unsafe pages in e820 reserved regions
ACPI / processor: Make acpi_cpu_soft_notify() process CPU FROZEN events
cpuidle: menu: Lookup CPU runqueues less
cpuidle: menu: Call nr_iowait_cpu less times
cpuidle: menu: Use ktime_to_us instead of reinventing the wheel
cpuidle: menu: Use shifts when calculating averages where possible
|
|
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64. This causes issues on powerpc32, which expects a 32bit
timespec.
This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.
[ Thomas is currently on vacation, but reviewed it and wanted me to send
this fix on to you directly. ]
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull more powerpc updates from Ben Herrenschmidt:
"Here are some more powerpc bits for 3.17, essentially fixes.
The biggest series, also aimed at -stable, is from Aneesh and is the
result of weeks and weeks of debugging to find out why the heck or THP
implementation was occasionally triggering multi-hit errors in our
level 1 TLB. It ended up being a combination of issues including
subtleties as to how we should invalidate those special 'MPSS' pages
we use to allow the use of 16M pages inside 4K/64K "base page size"
segments (you really have to love our MMU !)
Another interesting one in the "OMG" category is the series from
Michael adding memory barriers to spin_is_locked(). That's also the
result of many days of debugging to figure out why the semaphore code
would occasionally crash in ways that made no sense. It ended up
being some creative lock stacking that was defeated by the fact that
our locks allow a load inside the locked section to be re-ordered with
the load of the lock value itself (I'm still of two mind about whether
to kill that once and for all by putting a heavier barrier back into
our lock implementation...). The fixes come with a long explanation
in the cset comments, feel free to read it if you feel like having a
headache today"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc/thp: Add tracepoints to track hugepage invalidate
powerpc/mm: Use read barrier when creating real_pte
powerpc/thp: Use ACCESS_ONCE when loading pmdp
powerpc/thp: Invalidate with vpn in loop
powerpc/thp: Handle combo pages in invalidate
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
powerpc/thp: Add write barrier after updating the valid bit
powerpc: reorder per-cpu NUMA information's initialization
powerpc/perf/hv-24x7: Use kmem_cache_free
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
powerpc: Hard disable interrupts in xmon
powerpc: remove duplicate definition of TEXASR_FS
powerpc/pseries: Avoid deadlock on removing ddw
powerpc/pseries: Failure on removing device node
powerpc/boot: Use correct zlib types for comparison
powerpc/powernv: Interface to register/unregister opal dump region
printk: Add function to return log buffer address and size
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
powerpc/ppc476: Disable BTAC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull seccomp fix from James Morris.
BUG(!spin_is_locked()) really doesn't work very well in UP
configurations without any actual spinlock state. Which is very much
why we have that "assert_spin_lock()" function for this.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock
|
|
Unlike the original unfair rwlock implementation, queued rwlock
will grant lock according to the chronological sequence of the lock
requests except when the lock requester is in the interrupt context.
Consequently, recursive read_lock calls will now hang the process if
there is a write_lock call somewhere in between the read_lock calls.
This patch updates the lockdep implementation to look for recursive
read_lock calls. A new read state (3) is used to mark those read_lock
call that cannot be recursively called except in the interrupt
context. The new read state does exhaust the 2 bits available in
held_lock:read bit field. The addition of any new read state in the
future may require a redesign of how all those bits are squeezed
together in the held_lock structure.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407345722-61615-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Specifically:
Documentation/locking/lockdep-design.txt
Documentation/locking/lockstat.txt
Documentation/locking/mutex-design.txt
Documentation/locking/rt-mutex-design.txt
Documentation/locking/rt-mutex.txt
Documentation/locking/spinlocks.txt
Documentation/locking/ww-mutex-design.txt
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Cc: aswin@hp.com
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/1406752916-3341-6-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
4badad35 ("locking/mutex: Disable optimistic spinning on some
architectures") added a ARCH_SUPPORTS_ATOMIC_RMW flag to
disable the mutex optimistic feature on specific archs.
Because CONFIG_MUTEX_SPIN_ON_OWNER only depended on DEBUG and
SMP, it was ok to have the ->owner field conditional a bit
flexible. However by adding a new variable to the matter,
we can waste space with the unused field, ie: CONFIG_SMP &&
(!CONFIG_MUTEX_SPIN_ON_OWNER && !CONFIG_DEBUG_MUTEX).
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: aswin@hp.com
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1406752916-3341-5-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When we fail to acquire the mutex in the fastpath, we end up calling
__mutex_lock_common(). A *lot* goes on in this function. Move out the
optimistic spinning code into mutex_optimistic_spin() and simplify
the former a bit. Furthermore, this is similar to what we have in
rwsems. No logical changes.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: aswin@hp.com
Cc: mingo@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406752916-3341-4-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
... as we clearly inline mcs_spin_lock() now.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: aswin@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406752916-3341-3-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When unlocking, we always want to reach the slowpath with the lock's counter
indicating it is unlocked. -- as returned by the asm fastpath call or by
explicitly setting it. While doing so, at least in theory, we can optimize
and allow faster lock stealing.
When unlocking, we always want to reach the slowpath with the lock's counter
indicating it is unlocked. -- as returned by the asm fastpath call or by
explicitly setting it. While doing so, at least in theory, we can optimize
and allow faster lock stealing.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Cc: aswin@hp.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406752916-3341-2-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Just how the locking-end behaves, when unlocking, go ahead and
obtain the proper data structure immediately after the previous
(asm-end) call exits and there are (probably) pending waiters.
This simplifies a bit some of the layering.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Cc: aswin@hp.com
Cc: mingo@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1406752916-3341-1-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Its been a while and there are no in-tree users left, so remove the
deprecated barriers.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
One should first enqueue to the waitqueue and then check for the
condition. If the condition gets true after mutex_unlock() but before
poll_wait() then we lose it and would have wait for another wakeup.
This has been like this since v2.6.31-rc1 commit c7138f37f9 ("perf_counter:
fix perf_poll()"). Before that it was slightly worse. I guess we get enough
wakeups so if we miss here one it doesn't really matter. It is still a
bad example.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407159068-1478-1-git-send-email-bigeasy@linutronix.de
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In cases when the owner task exits before the workload and the
workload made some forks, all the events stay in until the last
workload process exits. Thats' because each child event holds
parent reference.
We want to release all children events once the parent is gone,
because at that time there's no process to read them anyway, so
they're just eating resources.
This removal races with process exit, which removes all events
and fork, which clone events. To be clear of those two, adding
work queue to remove orphaned child for context in case such
event is detected.
Using delayed work queue (with delay == 1), because we queue this
work under perf scheduler callbacks. Normal work queue tries to wake
up the queue process, which deadlocks on rq->lock in this place.
Also preventing clones from abandoned parent event.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406896382-18404-4-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Adding fake EVENT_OWNER_KERNEL owner pointer value for kernel perf
events, so we could distinguish it from user events, which needs
special care in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406896382-18404-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Platforms like IBM Power Systems supports service processor
assisted dump. It provides interface to add memory region to
be captured when system is crashed.
During initialization/running we can add kernel memory region
to be collected.
Presently we don't have a way to get the log buffer base address
and size. This patch adds support to return log buffer address
and size.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit c61037e9 fixes the phenomenon of 'fantom' cores due to
N*frac(smt_power) >= 1 by limiting the capacity to the actual
number of cores in the load balancing code.
This patch applies the same correction to the NUMA balancing
code.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that
task_numa_migrate will call task_numa_compare on the preferred
node all the time, even when the preferred node has no free capacity.
This could lead to a performance regression if nr_running == capacity
on both the source and the destination node. This can be avoided by
also checking for nr_running == capacity on the source node, which is
one stricter than checking .has_free_capacity.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: vincent.guittot@linaro.org
Cc: Morten.Rasmussen@arm.com
Cc: nicolas.pitre@linaro.org
Cc: efault@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1407173008-9334-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The child variable in build_overlap_sched_groups() actually refers to the
peer or sibling domain of the given CPU. Rename it to sibling to be consistent
with the naming in build_group_mask().
Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1406942283-18249-1-git-send-email-zzhsuny@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|