Age | Commit message (Collapse) | Author |
|
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
cpuset_lock/cpuset_cpus_allowed_locked code
This patch just states the fact the cpusets/cpuhotplug interaction is
broken and removes the deadlockable code which only pretends to work.
- cpuset_lock() doesn't really work. It is needed for
cpuset_cpus_allowed_locked() but we can't take this lock in
try_to_wake_up()->select_fallback_rq() path.
- cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes
callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex
stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take
cpuset_lock() and hangs forever because CPU is already dead and thus
T can't be scheduled.
- cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock()
which is not irq-safe, but try_to_wake_up() can be called from irq.
Kill them, and change select_fallback_rq() to use cpu_possible_mask, like
we currently do without CONFIG_CPUSETS.
Also, with or without this patch, with or without CONFIG_CPUSETS, the
callers of select_fallback_rq() can race with each other or with
set_cpus_allowed() pathes.
The subsequent patches try to to fix these problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091003.GA9123@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Howells <dhowells@redhat.com>
LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Trivial typo fix. rq->migration_thread can be NULL after
task_rq_unlock(), this is why we have "mt" which should be
used instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330165829.GA18284@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks
task_times(). Other places in kernel use nvcsw and nivcsw, which are
being cleared as well, Clear task statistics only.
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269940193.19286.14.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Merge reason: update to latest upstream
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Conflicts:
arch/x86/kernel/cpu/perf_event.c
Merge reason: Resolve the conflict, pick up fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
perf sched record can deadlock a box should the holder of
handle->data->lock take an interrupt, and then attempt to
acquire an rq lock held by a CPU trying to acquire the
same lock. Disable interrupts.
CPU0 CPU1
sched event with rq->lock held
grab handle->data->lock
spin on handle->data->lock
interrupt
try to grab rq->lock
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1269598293.6174.8.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
|
|
Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().
Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.
Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.
This makes the task migration event working and fix the context
switch callchains and origin ip.
Example: perf record -a -e cs
Before:
10.91% ksoftirqd/0 0 [k] 0000000000000000
|
--- (nil)
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_swevent_ctx_event
do_perf_sw_event
__perf_sw_event
perf_event_task_sched_out
schedule
run_ksoftirqd
kthread
kernel_thread_helper
After:
23.77% hald-addon-stor [kernel.kallsyms] [k] schedule
|
--- schedule
|
|--60.00%-- schedule_timeout
| wait_for_common
| wait_for_completion
| blk_execute_rq
| scsi_execute
| scsi_execute_req
| sr_test_unit_ready
| |
| |--66.67%-- sr_media_change
| | media_changed
| | cdrom_media_changed
| | sr_block_media_changed
| | check_disk_change
| | cdrom_open
v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
|
|
The trace event buffer used by perf to record raw sample events
is typed as an array of char and may then not be aligned to 8
by alloc_percpu().
But we need it to be aligned to 8 in sparc64 because we cast
this buffer into a random structure type built by the TRACE_EVENT()
macro to store the traces. So if a random 64 bits field is accessed
inside, it may be not under an expected good alignment.
Use an array of long instead to force the appropriate alignment, and
perform a compile time check to ensure the size in byte of the buffer
is a multiple of sizeof(long) so that its actual size doesn't get
shrinked under us.
This fixes unaligned accesses reported while using perf lock
in sparc 64.
Suggested-by: David Miller <davem@davemloft.net>
Suggested-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
|
|
Currently, binary readers of the ring buffer only know where events were
lost, but not how many events were lost at that location.
This information is available, but it would require adding another
field to the sub buffer header to include it.
But when a event can not fit at the end of a sub buffer, it is written
to the next sub buffer. This means there is a good chance that the
buffer may have room to hold this counter. If it does, write
the counter at the end of the sub buffer and set another flag
in the data size field that states that this information exists.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Now that the ring buffer can keep track of where events are lost.
Use this information to the output of trace_pipe:
hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
CPU:1 [LOST 673 EVENTS]
hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem
Even works with the function graph tracer:
2) ! 170.098 us | }
2) 4.036 us | rcu_irq_exit();
2) 3.657 us | idle_cpu();
2) ! 190.301 us | }
CPU:2 [LOST 2196 EVENTS]
2) 0.853 us | } /* cancel_dirty_page */
2) | remove_from_page_cache() {
2) 1.578 us | _raw_spin_lock_irq();
2) | __remove_from_page_cache() {
Note, it does not work with the iterator "trace" file, since it requires
the use of consuming the page from the ring buffer to determine how many
events were lost, which the iterator does not do.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.
This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.
In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.
But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.
Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.
Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.
For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).
We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.
Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.
Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
If modules are configured in the build but unloading of modules is not,
then the refcnt is not defined. Place the get/put module tracepoints
under CONFIG_MODULE_UNLOAD since it references this field in the module
structure.
As a side-effect, this patch also reduces the code when MODULE_UNLOAD
is not set, because these unused tracepoints are not created.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Remove the @refcnt argument, because it has side-effects, and arguments with
side-effects are not skipped by the jump over disabled instrumentation and are
executed even when the tracepoint is disabled.
This was also causing a GPF as found by Randy Dunlap:
Subject: 2.6.33 GP fault only when built with tracing
LKML-Reference: <4BA2B69D.3000309@oracle.com>
Note, the current 2.6.34-rc has a fix for the actual cause of the GPF,
but this fixes one of its triggers.
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Network folks reported that directing all MSI-X vectors of their multi
queue NICs to a single core can cause interrupt stack overflows when
enough interrupts fire at the same time.
This is caused by the fact that we run interrupt handlers by default
with interrupts enabled unless the driver reuqests the interrupt with
the IRQF_DISABLED set. The NIC handlers do not set this flag, so
simultaneous interrupts can nest unlimited and cause the stack
overflow.
The only safe counter measure is to run the interrupt handlers with
interrupts disabled. We can't switch to this mode in general right
now, but it is safe to do so for MSI interrupts.
Force IRQF_DISABLED for MSI interrupt handlers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@osdl.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: stable@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
CRED: Fix memory leak in error handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Do not free zero sized per cpu areas
x86: Make sure free_init_pages() frees pages on page boundary
x86: Make smp_locks end with page alignment
|
|
implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
|
Fix a memory leak on an OOM condition in prepare_usermodehelper_creds().
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
In some error handling cases the lock is not unlocked. The return is
converted to a goto, to share the unlock at the end of the function.
A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
expression E1;
identifier f;
@@
f (...) { <+...
* spin_lock_irq (E1,...);
... when != E1
* return ...;
...+> }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
LKML-Reference: <Pine.LNX.4.64.1003291736440.21896@ask.diku.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
# echo 1 > events/enable
# echo global > trace_clock
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:3162 check_flags+0xb2/0x190()
...
---[ end trace 3f86734a89416623 ]---
possible reason: unannotated irqs-on.
...
There's no reason to use the raw_local_irq_save() in trace_clock_global.
The local_irq_save() version is fine, and does not cause the bug in lockdep.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FA1.7030606@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
This avoids an infinite loop in free_early_partial().
Add a warning to free_early_partial() to catch future problems.
-v5: put back start > end back into WARN_ONCE()
-v6: use one line for warning, suggested by Linus
-v7: more tests
-v8: remove the function name as suggested by Johannes
WARN_ONCE() will print out that function name.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-4-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all
instances. Change the remaining instances. This makes the debugfs file
display the time mark and the owner's description again.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Otherwise we can get an oops if the user has no get_ref/put_ref
requirement.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
lockdep has custom code to check whether a pointer belongs to static
percpu area which is somewhat broken. Implement proper
is_kernel/module_percpu_address() and replace the custom code.
On UP, percpu variables are regular static variables and can't be
distinguished from them. Always return %false on UP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
|
|
Better encapsulate module static percpu area handling so that code
outsidef of CONFIG_SMP ifdef doesn't deal with mod->percpu directly
and add mod->percpu_size and record percpu_size in it. Both percpu
fields are compiled out on UP. While at it, mark mod->percpu w/
__percpu.
This is to prepare for is_module_percpu_address().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This patch removes the __cupinit from padata_cpu_callback(),
which is refered by the exportet function padata_alloc().
This could lead to problems if CONFIG_HOTPLUG_CPU is disabled,
which should happen very often.
WARNING: kernel/built-in.o(.text+0x7ffcb): Section mismatch in reference from the function padata_alloc() to the function .cpuinit.text:padata_cpu_callback()
The function padata_alloc() references
the function __cpuinit padata_cpu_callback().
This is often because padata_alloc lacks a __cpuinit
annotation or the annotation of padata_cpu_callback is wrong.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1
x86/PCI: for host bridge address space collisions, show conflicting resource
frv/PCI: remove redundant warnings
x86/PCI: remove redundant warnings
PCI: don't say we claimed a resource if we failed
PCI quirk: Disable MSI on VIA K8T890 systems
PCI quirk: RS780/RS880: work around missing MSI initialization
PCI quirk: only apply CX700 PCI bus parking quirk if external VT6212L is present
PCI: complain about devices that seem to be broken
PCI: print resources consistently with %pR
PCI: make disabled window printk style match the enabled ones
PCI: break out primary/secondary/subordinate for readability
PCI: for address space collisions, show conflicting resource
resources: add interfaces that return conflict information
PCI: cleanup error return for pcix get and set mmrbc functions
PCI: fix access of PCI_X_CMD by pcix get and set mmrbc functions
PCI: kill off pci_register_set_vga_state() symbol export.
PCI: fix return value from pcix_get_max_mmrbc()
|
|
When the cgroup freezer is used to freeze tasks we do not want to thaw
those tasks during resume. Currently we test the cgroup freezer
state of the resuming tasks to see if the cgroup is FROZEN. If so
then we don't thaw the task. However, the FREEZING state also indicates
that the task should remain frozen.
This also avoids a problem pointed out by Oren Ladaan: the freezer state
transition from FREEZING to FROZEN is updated lazily when userspace reads
or writes the freezer.state file in the cgroup filesystem. This means that
resume will thaw tasks in cgroups which should be in the FROZEN state if
there is no read/write of the freezer.state file to trigger this
transition before suspend.
NOTE: Another "simple" solution would be to always update the cgroup
freezer state during resume. However it's a bad choice for several reasons:
Updating the cgroup freezer state is somewhat expensive because it requires
walking all the tasks in the cgroup and checking if they are each frozen.
Worse, this could easily make resume run in N^2 time where N is the number
of tasks in the cgroup. Finally, updating the freezer state from this code
path requires trickier locking because of the way locks must be ordered.
Instead of updating the freezer state we rely on the fact that lazy
updates only manage the transition from FREEZING to FROZEN. We know that
a cgroup with the FREEZING state may actually be FROZEN so test for that
state too. This makes sense in the resume path even for partially-frozen
cgroups -- those that really are FREEZING but not FROZEN.
Reported-by: Oren Ladaan <orenl@cs.columbia.edu>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
show_state will dump all tasks state, so if freezer failed to freeze
any task, kernel will dump all tasks state and flood the dmesg log.
This patch makes freezer only show state of tasks refusing to freeze.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
time: Fix accumulation bug triggered by long delay.
posix-cpu-timers: Reset expire cache when no timer is running
timer stats: Fix del_timer_sync() and try_to_del_timer_sync()
clockevents: Sanitize min_delta_ns adjustment and prevent overflows
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Use proper type in sched_getaffinity()
kernel/sched.c: Suppress unused var warning
sched: sched_getaffinity(): Allow less than NR_CPUS length
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Move two IRQ functions from .init.text to .text
genirq: Protect access to irq_desc->action in can_request_irq()
genirq: Prevent oneshot irq thread race
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove excessive early_res debug output
softlockup: Stop spurious softlockup messages due to overflow
rcu: Fix local_irq_disable() CONFIG_PROVE_RCU=y false positives
rcu: Fix tracepoints & lockdep false positive
rcu: Make rcu_read_lock_bh_held() allow for disabled BH
|
|
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.
It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.
Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.
So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cpuset_mem_spread_node() returns an offline node, and causes an oops.
This patch fixes it by initializing task->mems_allowed to
node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing
memory hotplug.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reported-by: Nick Piggin <npiggin@suse.de>
Tested-by: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
commit e6a1105b ("cgroups: subsystem module loading interface") and commit
c50cc752 ("sched, cgroups: Fix module export") result in duplicate
including of module.h
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Both functions should not be marked as __init, since they be called
from modules after the init section is freed.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
LKML-Reference: <1269431961-5731-1-git-send-email-henne@nachtwindheim.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
can_request_irq() accesses and dereferences irq_desc->action w/o
holding irq_desc->lock. So action can be freed on another CPU before
it's dereferenced. Unlikely, but ...
Protect it with desc->lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Expose irq_desc->node as /proc/irq/*/node.
This file provides device hardware locality information for apps
desiring to include hardware locality in irq mapping decisions.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
request_resource() and insert_resource() only return success or failure,
which no information about what existing resource conflicted with the
proposed new reservation. This patch adds request_resource_conflict()
and insert_resource_conflict(), which return the conflicting resource.
Callers may use this for better error messages or to adjust the new
resource and retry the request.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
Now that no arches are accessing time_adjust directly,
make it static.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1268968769-19209-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The logarithmic accumulation done in the timekeeping has some overflow
protection that limits the max shift value. That means it will take
more then shift loops to accumulate all of the cycles. This causes
the shift decrement to underflow, which causes the loop to never exit.
The simplest fix would be simply to do a:
if (shift)
shift--;
However that is not optimal, as we know the cycle offset is larger
then the interval << shift, the above would make shift drop to zero,
then we would be spinning for quite awhile accumulating at interval
chunks at a time.
Instead, this patch only decreases shift if the offset is smaller
then cycle_interval << shift. This makes sure we accumulate using
the largest chunks possible without overflowing tick_length, and limits
the number of iterations through the loop.
This issue was found and reported by Sonic Zhang, who also tested the fix.
Many thanks your explanation and testing!
Reported-by: Sonic Zhang <sonic.adi@gmail.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Sonic Zhang <sonic.adi@gmail.com>
LKML-Reference: <1268948850-5225-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Merge reason: Pick up latest perf fixes from upstream.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Ensure additions on touch_ts do not overflow. This can occur
when the top 32 bits of the TSC reach 0xffffffff causing
additions to touch_ts to overflow and this in turn generates
spurious softlockup warnings.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1268994482.1798.6.camel@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The ring buffer uses 4 byte alignment while recording events into the
buffer, even on 64bit machines. This saves space when there are lots
of events being recorded at 4 byte boundaries.
The ring buffer has a zero copy method to write into the buffer, with
the reserving of space and then committing it. This may cause problems
when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
PPC this is not an issue, but on some architectures this would cause an
out-of-alignment exception.
This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
it forces the ring buffer event header to be 8 bytes and not 4,
and will align the length of the data to be 8 byte aligned.
This keeps the data payload at 8 byte alignments and will allow these
machines to run without issue.
The trick to this is that the header can be either 4 bytes or 8 bytes
depending on the length of the data payload. The 4 byte header
has a length field that supports up to 112 bytes. If the length of
the data is more than 112, the length field is set to zero, and the actual
length is stored in the next 4 bytes after the header.
When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
zero in the 4 byte header forcing the length to be stored in the 4 byte
array, even with a small data load. It also forces the length of the
data load to be 8 byte aligned. The combination of these two guarantee
that the data is always at 8 byte alignment.
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
(on sparc64)
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|