summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2011-12-06perf: Avoid a useless pmu_disable() in the perf-tickPeter Zijlstra
Gleb writes: > Currently pmu is disabled and re-enabled on each timer interrupt even > when no rotation or frequency adjustment is needed. On Intel CPU this > results in two writes into PERF_GLOBAL_CTRL MSR per tick. On bare metal > it does not cause significant slowdown, but when running perf in a virtual > machine it leads to 20% slowdown on my machine. Cure this by keeping a perf_event_context::nr_freq counter that counts the number of active events that require frequency adjustments and use this in a similar fashion to the already existing nr_events != nr_active test in perf_rotate_context(). By being able to exclude both rotation and frequency adjustments a-priory for the common case we can avoid the otherwise superfluous PMU disable. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-515yhoatehd3gza7we9fapaa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge reason: Add these cherry-picked commits so that future changes on perf/core don't conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-05ftrace: Fix hash record accounting bugSteven Rostedt
If the set_ftrace_filter is cleared by writing just whitespace to it, then the filter hash refcounts will be decremented but not updated. This causes two bugs: 1) No functions will be enabled for tracing when they all should be 2) If the users clears the set_ftrace_filter twice, it will crash ftrace: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7() Modules linked in: Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32 Call Trace: [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7 [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f [<ffffffff8111bdfe>] ? kfree+0xe5/0x115 [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151 [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f [<ffffffff8112e49a>] fput+0xfd/0x1c2 [<ffffffff8112b54c>] filp_close+0x6d/0x78 [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1 [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54 [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b ---[ end trace 77a3a7ee73794a02 ]--- Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05jump_label: jump_label_inc may return before the code is patchedGleb Natapov
If cpu A calls jump_label_inc() just after atomic_add_return() is called by cpu B, atomic_inc_not_zero() will return value greater then zero and jump_label_inc() will return to a caller before jump_label_update() finishes its job on cpu B. Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com Cc: stable@vger.kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05ftrace: Remove force undef config value left for testingSteven Rostedt
A forced undef of a config value was used for testing and was accidently left in during the final commit. This causes x86 to run slower than needed while running function tracing as well as causes the function graph selftest to fail when DYNMAIC_FTRACE is not set. This is because the code in MCOUNT expects the ftrace code to be processed with the config value set that happened to be forced not set. The forced config option was left in by: commit 6331c28c962561aee59e5a493b7556a4bb585957 ftrace: Fix dynamic selftest failure on some archs Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian Cc: stable@vger.kernel.org Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05tracing: Restore system filter behaviorLi Zefan
Though not all events have field 'prev_pid', it was allowed to do this: # echo 'prev_pid == 100' > events/sched/filter but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap entire filter of events) broke it without any reason. Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05tracing: fix event_subsystem ref countingIlya Dryomov
Fix a bug introduced by e9dbfae5, which prevents event_subsystem from ever being released. Ref_count was added to keep track of subsystem users, not for counting events. Subsystem is created with ref_count = 1, so there is no need to increment it for every event, we have nr_events for that. Fix this by touching ref_count only when we actually have a new user - subsystem_open(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05Merge branch 'tip/perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent
2011-12-05perf: Fix loss of notification with multi-eventPeter Zijlstra
When you do: $ perf record -e cycles,cycles,cycles noploop 10 You expect about 10,000 samples for each event, i.e., 10s at 1000samples/sec. However, this is not what's happening. You get much fewer samples, maybe 3700 samples/event: $ perf report -D | tail -15 Aggregated stats: TOTAL events: 10998 MMAP events: 66 COMM events: 2 SAMPLE events: 10930 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 cycles stats: TOTAL events: 3642 SAMPLE events: 3642 cycles stats: TOTAL events: 3644 SAMPLE events: 3644 On a Intel Nehalem or even AMD64, there are 4 counters capable of measuring cycles, so there is plenty of space to measure those events without multiplexing (even with the NMI watchdog active). And even with multiplexing, we'd expect roughly the same number of samples per event. The root of the problem was that when the event that caused the buffer to become full was not the first event passed on the cmdline, the user notification would get lost. The notification was sent to the file descriptor of the overflowed event but the perf tool was not polling on it. The perf tool aggregates all samples into a single buffer, i.e., the buffer of the first event. Consequently, it assumes notifications for any event will come via that descriptor. The seemingly straight forward solution of moving the waitq into the ringbuffer object doesn't work because of life-time issues. One could perf_event_set_output() on a fd that you're also blocking on and cause the old rb object to be freed while its waitq would still be referenced by the blocked thread -> FAIL. Therefore link all events to the ringbuffer and broadcast the wakeup from the ringbuffer object to all possible events that could be waited upon. This is rather ugly, and we're open to better solutions but it works for now. Reported-by: Stephane Eranian <eranian@google.com> Finished-by: Stephane Eranian <eranian@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-01trace_events_filter: Use rcu_assign_pointer() when setting ↵Tejun Heo
ftrace_event_call->filter ftrace_event_call->filter is sched RCU protected but didn't use rcu_assign_pointer(). Use it. TODO: Add proper __rcu annotation to call->filter and all its users. -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric. Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@kernel.org # (2.6.39+) Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-15Merge branch 'core' of git://amd64.org/linux/rric into perf/coreIngo Molnar
2011-11-14events: Don't divide events if it has field periodAndrew Vagin
This patch solves the following problem: Now some samples may be lost due to throttling. The number of samples is restricted by sysctl_perf_event_sample_rate/HZ. A trace event is divided on some samples according to event's period. I don't sure, that we should generate more than one sample on each trace event. I think the better way to use SAMPLE_PERIOD. E.g.: I want to trace when a process sleeps. I created a process, which sleeps for 1ms and for 4ms. perf got 100 events in both cases. swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns] swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns] In the first case a kernel want to send 4499585 events and in the second case it wants to send 1386750 events. perf-reports shows that process sleeps in both places equal time. It's bug. With this patch kernel generates one event on each "sleep" and the time slice is saved in the field "period". Perf knows how handle it. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320670457-2633428-3-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf: Carve out callchain functionalityBorislav Petkov
Split the callchain code from the perf events core into a new kernel/events/callchain.c file. This simplifies a bit the big core.c Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> [keep ctx recursion handling inline and use internal headers] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1318778104-17152-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf: Do not set task_ctx pointer in cpuctx if there are no events in the ↵Gleb Natapov
context Do not set task_ctx pointer during sched_in if there are no events associated with the context. Otherwise if during task execution total number of events in the system will become zero perf_event_context_sched_out() will not be called and cpuctx->task_ctx will be left with a stale value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111023171033.GI17571@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-11Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2011-11-08Merge branch 'perf/core' into oprofile/masterRobert Richter
Merge reason: Resolve conflicts with Don's NMI rework: commit 9c48f1c629ecfa114850c03f875c6691003214de Author: Don Zickus <dzickus@redhat.com> Date: Fri Sep 30 15:06:21 2011 -0400 x86, nmi: Wire up NMI handlers to new routines Conflicts: arch/x86/oprofile/nmi_timer_int.c Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-07PM / QoS: Set cpu_dma_pm_qos->nameDominik Brodowski
Since commit 4a31a334, the name of this misc device is not initialized, which leads to a funny device named /dev/(null) being created and /proc/misc containing an entry with just a number but no name. The latter leads to complaints by cryptsetup, which caused me to investigate this matter. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-07tracing/latency: Fix header output for latency tracersJiri Olsa
In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers do not display proper latency header. The involved/fixed latency tracers are: wakeup_rt wakeup preemptirqsoff preemptoff irqsoff The patch adds proper handling of tracer configuration options for latency tracers, and displaying correct header info accordingly. * The current output (for wakeup tracer) with both graph and function tracers disabled is: # tracer: wakeup # <idle>-0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0 <idle>-0 0d.h5 3us+: ttwu_do_activate.clone.1 <-try_to_wake_up ... * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0 cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 <-try_to_wake_up * The current output (for wakeup tracer) with only function tracer enabled is: # tracer: wakeup # cat-1140 0d..4 1us+: 1140:120:R + [000] 6: 0:R migration/0 cat-1140 0d..4 2us : ttwu_do_activate.clone.1 <-try_to_wake_up * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / <idle>-0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1 <idle>-0 1d.h5 3us : ttwu_do_activate.clone.1 <-try_to_wake_up Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07ftrace: Fix hash record accounting bugSteven Rostedt
If the set_ftrace_filter is cleared by writing just whitespace to it, then the filter hash refcounts will be decremented but not updated. This causes two bugs: 1) No functions will be enabled for tracing when they all should be 2) If the users clears the set_ftrace_filter twice, it will crash ftrace: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7() Modules linked in: Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32 Call Trace: [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7 [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f [<ffffffff8111bdfe>] ? kfree+0xe5/0x115 [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151 [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f [<ffffffff8112e49a>] fput+0xfd/0x1c2 [<ffffffff8112b54c>] filp_close+0x6d/0x78 [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1 [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54 [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b ---[ end trace 77a3a7ee73794a02 ]--- Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07jump_label: jump_label_inc may return before the code is patchedGleb Natapov
If cpu A calls jump_label_inc() just after atomic_add_return() is called by cpu B, atomic_inc_not_zero() will return value greater then zero and jump_label_inc() will return to a caller before jump_label_update() finishes its job on cpu B. Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com Cc: stable@vger.kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07ftrace: Remove force undef config value left for testingSteven Rostedt
A forced undef of a config value was used for testing and was accidently left in during the final commit. This causes x86 to run slower than needed while running function tracing as well as causes the function graph selftest to fail when DYNMAIC_FTRACE is not set. This is because the code in MCOUNT expects the ftrace code to be processed with the config value set that happened to be forced not set. The forced config option was left in by: commit 6331c28c962561aee59e5a493b7556a4bb585957 ftrace: Fix dynamic selftest failure on some archs Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian Cc: stable@vger.kernel.org Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07lockdep: Show subclass in pretty print of lockdep outputSteven Rostedt
The pretty print of the lockdep debug splat uses just the lock name to show how the locking scenario happens. But when it comes to nesting locks, the output becomes confusing which takes away the point of the pretty printing of the lock scenario. Without displaying the subclass info, we get the following output: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slock-AF_INET); lock(slock-AF_INET); lock(slock-AF_INET); lock(slock-AF_INET); *** DEADLOCK *** The above looks more of a A->A locking bug than a A->B B->A. By adding the subclass to the output, we can see what really happened: other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slock-AF_INET); lock(slock-AF_INET/1); lock(slock-AF_INET); lock(slock-AF_INET/1); *** DEADLOCK *** This bug was discovered while tracking down a real bug caught by lockdep. Link: http://lkml.kernel.org/r/20111025202049.GB25043@hostway.ca Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-06Merge branch 'upstream/jump-label-noearly' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: jump-label: initialize jump-label subsystem much earlier x86/jump_label: add arch_jump_label_transform_static() s390/jump-label: add arch_jump_label_transform_static() jump_label: add arch_jump_label_transform_static() to optimise non-live code updates sparc/jump_label: drop arch_jump_label_text_poke_early() x86/jump_label: drop arch_jump_label_text_poke_early() jump_label: if a key has already been initialized, don't nop it out stop_machine: make stop_machine safe and efficient to call early jump_label: use proper atomic_t initializer Conflicts: - arch/x86/kernel/jump_label.c Added __init_or_module to arch_jump_label_text_poke_early vs removal of that function entirely - kernel/stop_machine.c same patch ("stop_machine: make stop_machine safe and efficient to call early") merged twice, with whitespace fix in one version
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-06Merge branch 'writeback-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Add a 'reason' to wb_writeback_work writeback: send work item to queue_io, move_expired_inodes writeback: trace event balance_dirty_pages writeback: trace event bdi_dirty_ratelimit writeback: fix ppc compile warnings on do_div(long long, unsigned long) writeback: per-bdi background threshold writeback: dirty position control - bdi reserve area writeback: control dirty pause time writeback: limit max dirty pause time writeback: IO-less balance_dirty_pages() writeback: per task dirty rate limit writeback: stabilize bdi->dirty_ratelimit writeback: dirty rate control writeback: add bg_threshold parameter to __bdi_update_bandwidth() writeback: dirty position control writeback: account per-bdi accumulated dirtied pages
2011-11-06Merge git://github.com/rustyrussell/linuxLinus Torvalds
* git://github.com/rustyrussell/linux: module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree module: Enable dynamic debugging regardless of taint
2011-11-06Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/p3060qds: Add support for P3060QDS board powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX powerpc/85xx: Make kexec to interate over online cpus powerpc/fsl_booke: Fix comment in head_fsl_booke.S powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver powerpc/86xx: Correct Gianfar support for GE boards powerpc/cpm: Clear muram before it is in use. drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager powerpc/fsl_msi: add support for "msi-address-64" property powerpc/85xx: Setup secondary cores PIR with hard SMP id powerpc/fsl-booke: Fix settlbcam for 64-bit powerpc/85xx: Adding DCSR node to dtsi device trees powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards powerpc/85xx: fix PHYS_64BIT selection for P1022DS powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map powerpc: respect mem= setting for early memory limit setup powerpc: Update corenet64_smp_defconfig powerpc: Update mpc85xx/corenet 32-bit defconfigs ... Fix up trivial conflicts in: - arch/powerpc/configs/40x/hcu4_defconfig removed stale file, edited elsewhere - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c: added opal and gelic drivers vs added ePAPR driver - drivers/tty/serial/8250.c moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
2011-11-07module,bug: Add TAINT_OOT_MODULE flag for modules not built in-treeBen Hutchings
Use of the GPL or a compatible licence doesn't necessarily make the code any good. We already consider staging modules to be suspect, and this should also be true for out-of-tree modules which may receive very little review. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Reviewed-by: Dave Jones <davej@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (patched oops-tracing.txt)
2011-11-07module: Enable dynamic debugging regardless of taintBen Hutchings
Dynamic debugging is currently disabled for tainted modules, except for TAINT_CRAP. This prevents use of dynamic debugging for out-of-tree modules once the next patch is applied. This condition was apparently intended to avoid a crash if a force- loaded module has an incompatible definition of dynamic debug structures. However, a administrator that forces us to load a module is claiming that it *is* compatible even though it fails our version checks. If they are mistaken, there are any number of ways the module could crash the system. As a side-effect, proprietary and other tainted modules can now use dynamic_debug. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-04tracing: Add boiler plate for subsystem filterSteven Rostedt
The system filter can be used to set multiple event filters that exist within the system. But currently it displays the last filter written that does not necessarily correspond to the filters within the system. The system filter itself is not used to filter any events. The system filter is just a means to set filters of the events within it. Because this causes an ambiguous state when the system filter reads a filter string but the events within the system have different strings it is best to just show a boiler plate: ### global filter ### # Use this to set filters for multiple events. # Only events with the given fields will be affected. # If no events are modified, an error message will be displayed here. If an error occurs while writing to the system filter, the system filter will replace the boiler plate with the error message as it currently does. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-04PM / Freezer: Revert 27920651fe "PM / Freezer: Make fake_signal_wake_up() ↵Tejun Heo
wake TASK_KILLABLE tasks too" Commit 27920651fe "PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too" updated fake_signal_wake_up() used by freezer to wake up KILLABLE tasks. Sending unsolicited wakeups to tasks in killable sleep is dangerous as there are code paths which depend on tasks not waking up spuriously from KILLABLE sleep. For example. sys_read() or page can sleep in TASK_KILLABLE assuming that wait/down/whatever _killable can only fail if we can not return to the usermode. TASK_TRACED is another obvious example. The previous patch updated wait_event_freezekillable() such that it doesn't depend on the spurious wakeup. This patch reverts the offending commit. Note that the spurious KILLABLE wakeup had other implicit effects in KILLABLE sleeps in nfs and cifs and those will need further updates to regain freezekillable behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04PM / QoS: Remove redundant checkGuennadi Liakhovetski
Remove an "if" check, that repeats an equivalent one 6 lines above. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04PM / Sleep: Fix race between CPU hotplug and freezerSrivatsa S. Bhat
The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down() functions depend on the value of the 'tasks_frozen' argument passed to them (which indicates whether tasks have been frozen or not). (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN, CPU_DEAD, CPU_DEAD_FROZEN). Thus, it is essential that while the callbacks for those notifications are running, the state of the system with respect to the tasks being frozen or not remains unchanged, *throughout that duration*. Hence there is a need for synchronizing the CPU hotplug code with the freezer subsystem. Since the freezer is involved only in the Suspend/Hibernate call paths, this patch hooks the CPU hotplug code to the suspend/hibernate notifiers PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent the race between CPU hotplug and freezer, thus ensuring that CPU hotplug notifications will always be run with the state of the system really being what the notifications indicate, _throughout_ their execution time. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-04oprofile, x86: Reimplement nmi timer mode using perf eventRobert Richter
The legacy x86 nmi watchdog code was removed with the implementation of the perf based nmi watchdog. This broke Oprofile's nmi timer mode. To run nmi timer mode we relied on a continuous ticking nmi source which the nmi watchdog provided. The nmi tick was no longer available and current watchdog can not be used anymore since it runs with very long periods in the range of seconds. This patch reimplements the nmi timer mode using a perf counter nmi source. V2: * removing pr_info() * fix undefined reference to `__udivdi3' for 32 bit build * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb * removed nmi timer setup in arch/x86 * implemented function stubs for op_nmi_init/exit() * made code more readable in oprofile_init() V3: * fix architectural initialization in oprofile_init() * fix CONFIG_OPROFILE_NMI_TIMER dependencies Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-03Revert "perf: Add PM notifiers to fix CPU hotplug races"Linus Torvalds
This reverts commit 144060fee07e9c22e179d00819c83c86fbcbf82c. It causes a resume regression for Andi on his Acer Aspire 1830T post 3.1. The screen just stays black after wakeup. Also, it really looks like the wrong way to suspend and resume perf events: I think they should be done as part of the CPU suspend and resume, rather than as a notifier that does smp_call_function(). Reported-by: Andi Kleen <andi@firstfloor.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02memcg: replace ss->id_lock with a rwlockAndrew Bresticker
While back-porting Johannes Weiner's patch "mm: memcg-aware global reclaim" for an internal effort, we noticed a significant performance regression during page-reclaim heavy workloads due to high contention of the ss->id_lock. This lock protects idr map, and serializes calls to idr_get_next() in css_get_next() (which is used during the memcg hierarchy walk). Since idr_get_next() is just doing a look up, we need only serialize it with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a rwlock, contention is greatly reduced and performance improves. Tested: cat a 256m file from a ramdisk in a 128m container 50 times on each core (one file + container per core) in parallel on a NUMA machine. Result is the time for the test to complete in 1 of the containers. Both kernels included Johannes' memcg-aware global reclaim patches. Before rwlock patch: 1710.778s After rwlock patch: 152.227s Signed-off-by: Andrew Bresticker <abrestic@google.com> Cc: Paul Menage <menage@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02sysctl: add support for poll()Lucas De Marchi
Adding support for poll() in sysctl fs allows userspace to receive notifications of changes in sysctl entries. This adds a infrastructure to allow files in sysctl fs to be pollable and implements it for hostname and domainname. [akpm@linux-foundation.org: s/declare/define/ for definitions] Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Greg KH <gregkh@suse.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02cpusets: avoid looping when storing to mems_allowed if one node remains setDavid Rientjes
{get,put}_mems_allowed() exist so that general kernel code may locklessly access a task's set of allowable nodes without having the chance that a concurrent write will cause the nodemask to be empty on configurations where MAX_NUMNODES > BITS_PER_LONG. This could incur a significant delay, however, especially in low memory conditions because the page allocator is blocking and reclaim requires get_mems_allowed() itself. It is not atypical to see writes to cpuset.mems take over 2 seconds to complete, for example. In low memory conditions, this is problematic because it's one of the most imporant times to change cpuset.mems in the first place! The only way a task's set of allowable nodes may change is through cpusets by writing to cpuset.mems and when attaching a task to a generic code is not reading the nodemask with get_mems_allowed() at the same time, and then clearing all the old nodes. This prevents the possibility that a reader will see an empty nodemask at the same time the writer is storing a new nodemask. If at least one node remains unchanged, though, it's possible to simply set all new nodes and then clear all the old nodes. Changing a task's nodemask is protected by cgroup_mutex so it's guaranteed that two threads are not changing the same task's nodemask at the same time, so the nodemask is guaranteed to be stored before another thread changes it and determines whether a node remains set or not. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Paul Menage <paul@paulmenage.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02cgroups: don't attach task to subsystem if migration failedBen Blum
If a task has exited to the point it has called cgroup_exit() already, then we can't migrate it to another cgroup anymore. This can happen when we are attaching a task to a new cgroup between the call to ->can_attach_task() on subsystems and the migration that is eventually tried in cgroup_task_migrate(). In this case cgroup_task_migrate() returns -ESRCH and we don't want to attach the task to the subsystems because the attachment to the new cgroup itself failed. Fix this by only calling ->attach_task() on the subsystems if the cgroup migration succeeded. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ben Blum <bblum@andrew.cmu.edu> Acked-by: Paul Menage <paul@paulmenage.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02cgroups: more safe tasklist locking in cgroup_attach_procBen Blum
Fix unstable tasklist locking in cgroup_attach_proc. According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is not sufficient to guarantee the tasklist is stable w.r.t. de_thread and exit. Taking tasklist_lock for reading, instead of rcu_read_lock, ensures proper exclusion. Signed-off-by: Ben Blum <bblum@andrew.cmu.edu> Acked-by: Paul Menage <paul@paulmenage.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02tracing: Restore system filter behaviorLi Zefan
Though not all events have field 'prev_pid', it was allowed to do this: # echo 'prev_pid == 100' > events/sched/filter but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap entire filter of events) broke it without any reason. Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-01Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-socLinus Torvalds
* 'next/dt' of git://git.linaro.org/people/arnd/arm-soc: ARM: gic: use module.h instead of export.h ARM: gic: fix irq_alloc_descs handling for sparse irq ARM: gic: add OF based initialization ARM: gic: add irq_domain support irq: support domains with non-zero hwirq base of/irq: introduce of_irq_init ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support ARM: at91: dt: at91sam9g45 family and board device tree files arm/mx5: add device tree support for imx51 babbage arm/mx5: add device tree support for imx53 boards ARM: msm: Add devicetree support for msm8660-surf msm_serial: Add devicetree support msm_serial: Use relative resources for iomem Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}
2011-10-31Merge branch 'akpm' (Andrew's incoming)Linus Torvalds
Quoth Andrew: - Most of MM. Still waiting for the poweroc guys to get off their butts and review some threaded hugepages patches. - alpha - vfs bits - drivers/misc - a few core kerenl tweaks - printk() features - MAINTAINERS updates - backlight merge - leds merge - various lib/ updates - checkpatch updates * akpm: (127 commits) epoll: fix spurious lockdep warnings checkpatch: add a --strict check for utf-8 in commit logs kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete llist-return-whether-list-is-empty-before-adding-in-llist_add-fix wireless: at76c50x: follow rename pack_hex_byte to hex_byte_pack fat: follow rename pack_hex_byte() to hex_byte_pack() security: follow rename pack_hex_byte() to hex_byte_pack() kgdb: follow rename pack_hex_byte() to hex_byte_pack() lib: rename pack_hex_byte() to hex_byte_pack() lib/string.c: fix strim() semantics for strings that have only blanks lib/idr.c: fix comment for ida_get_new_above() lib/percpu_counter.c: enclose hotplug only variables in hotplug ifdef lib/bitmap.c: quiet sparse noise about address space lib/spinlock_debug.c: print owner on spinlock lockup lib/kstrtox: common code between kstrto*() and simple_strto*() functions drivers/leds/leds-lp5521.c: check if reset is successful leds: turn the blink_timer off before starting to blink leds: save the delay values after a successful call to blink_set() drivers/leds/leds-gpio.c: use gpio_get_value_cansleep() when initializing drivers/leds/leds-lm3530.c: add __devexit_p where needed ...
2011-10-31kgdb: follow rename pack_hex_byte() to hex_byte_pack()Andy Shevchenko
There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31printk: remove bounds checking for log_prefixWilliam Douglas
Currently log_prefix is testing that the first character of the log level and facility is less than '0' and greater than '9' (which is always false). Since the code being updated works because strtoul bombs out (endp isn't updated) and 0 is returned anyway just remove the check and don't change the behavior of the function. Signed-off-by: William Douglas <william.douglas@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31printk: fix bounds checking for log_prefixWilliam Douglas
Currently log_prefix is testing that the first character of the log level and facility is less than '0' and greater than '9' (which is always false). It should be testing to see if the character less than '0' or greater than '9' instead. This patch makes that change. The code being changed worked because strtoul bombs out (endp isn't updated) and 0 is returned anyway. Signed-off-by: William Douglas <william.douglas@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31printk: add console_suspend module parameterYanmin Zhang
We are enabling some power features on medfield. To test suspend-2-RAM conveniently, we need turn on/off console_suspend_enabled frequently. Add a module parameter, so users could change it by: /sys/module/printk/parameters/console_suspend Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31printk: add module parameter ignore_loglevel to control ignore_loglevelYanmin Zhang
We are enabling some power features on medfield. To test suspend-2-RAM conveniently, we need turn on/off ignore_loglevel frequently without rebooting. Add a module parameter, so users can change it by: /sys/module/printk/parameters/ignore_loglevel Signed-off-by: Yanmin Zhang <yanmin.zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31kernel/sysctl.c: add cap_last_cap to /proc/sys/kernelDan Ballard
Userspace needs to know the highest valid capability of the running kernel, which right now cannot reliably be retrieved from the header files only. The fact that this value cannot be determined properly right now creates various problems for libraries compiled on newer header files which are run on older kernels. They assume capabilities are available which actually aren't. libcap-ng is one example. And we ran into the same problem with systemd too. Now the capability is exported in /proc/sys/kernel/cap_last_cap. [akpm@linux-foundation.org: make cap_last_cap const, per Ulrich] Signed-off-by: Dan Ballard <dan@mindstab.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lennart Poettering <lennart@poettering.net> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Ulrich Drepper <drepper@akkadia.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTLVasily Averin
Fix compilation warnings for CONFIG_SYSCTL=n: fixed compilation warnings in case of disabled CONFIG_SYSCTL kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used these functions are static and are used only in sysctl handler, so move them inside #ifdef CONFIG_SYSCTL too Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>